Ian's Blog: August 2012

Tuesday 28 August 2012

Opa Tutorial - Intermission

Cedric Soulas of MLState - the creators of Opa - has very kindly put up some suggestions on writing code based on my tutorials. They're available via GitHub at https://github.com/cedricss/ian-oliver-tutorials . My code resides over on SourceForge.

While these are based on part 4 of the series they are applicable to later parts and indeed I will integrate these in due course.

One thing I do want to point out now is the use of a parser instead of an explicit pattern match dispatching URLs to specific functions. Very briefly the code which reads:

function start(url)

{

  match (url)

{

       case {path: [] ... }: hello();

       case {path: ["expressions" ] ... } : expressionsRESTendpoint();

       case {path: ["expressions" | [ key | _ ] ] ...} : expressionWithKeyRESTendpoint(key);

       case {~path ...}: error();

}

}

Server.start(

   Server.http,

     [ {resources: @static_include_directory("resources")} , {dispatch: start} ]

);

can be rewritten using a parser:

start = parser {
   case "/": hello();
   case "/expressions" : expressionsRESTendpoint();
    case "/expressions/" key=((!"/".)*) (.*):      expressionWithKeyRESTendpoint(Text.to_string(key));
   default: error();
}

Server.start(
   Server.http,
   [ {resources: @static_include_directory("resources")} , {custom: start} ]
);

Which performs more or less the same function with the bonus that we obtain much more flexibility in how we process the URL - rather than treating it statically in the pattern matching. Note how we're no longer calling an explicit function but now have a parser as first-class object :-)

At this time there's relatively little documentation on the parser in Opa, but on the Opa Blog there's a tantalizing glimpse of what is possible.

So with that to whet your appetites go and download the latest nightly build and start coding...while I finalise composing part 5...

Tuesday 21 August 2012

Semantic Isolation and Privacy

Somewhat side-tracked in writing these (pt1, pt2, pt2.5) and thinking about how best to explain some of the issues, especially when getting to the deeper semantic levels. However a work discussion about Apple and Amazon's security flaws and the case of Mat Honan provided an interesting answer which I think describes the problem quite well.

In the above incident hackers used information from Amazon to, primarily, social engineer Apple's customer service into believing that the hackers were Mat Honan. From the Wired article (linked above and [1]), Honan provides the quote:

But what happened to me exposes vital security flaws in several customer service systems, most notably Apple’s and Amazon’s. Apple tech support gave the hackers access to my iCloud account. Amazon tech support gave them the ability to see a piece of information — a partial credit card number — that Apple used to release information. In short, the very four digits that Amazon considers unimportant enough to display in the clear on the web are precisely the same ones that Apple considers secure enough to perform identity verification. The disconnect exposes flaws in data management policies endemic to the entire technology industry, and points to a looming nightmare as we enter the era of cloud computing and connected devices.

Both accounts at Apple and Amazon have a user identifier and passwords; they also have a set of other criteria to establish whether a given human is who they say they are. In Amazon's case they ask for email, address and the titles of some books you've bought from them - at least last time I called. In Apple's case it looks like they wanted some more personal information, in this case the "least significant digits" of a credit card number. I say least significant because these particular digits are often printed in plain text on receipts. As far as Visa, MasterCard, Diners etc consider, these digits have no meaning - though I have an issue with that as we shall see.

In Amazon's context the data about a user is semantically isolated from Apple's context. This is a level deeper than saying that they both had user identifiers but to what Real World concept and instance those identifiers meant and represented. The trouble here started when the realisation was made that the instance that the Amazon identifer related to could be the same as the thing that the Apple identifier related to, in this case the Real World Mat Honan. To make this complete it also turns out that the meaningless four least significant credit card digits in Amazon's context were the proof of identity in Apple's context.

We can argue that the data and identity management procedures in both cases were at fault, however in analysis this was actually hard to see: How could 4 random digits, effectively uniquely identify a person and without an understanding of each other's semantic view of the world, who would have realised?

The whole hack described in Mat Honan's article goes into a lot more detail on how this information was found out. Indeed much of the information required is already public and it is just a case of putting all this together and effectively using that to make a consistent profile.

As for credit card numbers, the practice of displaying the final four digits or the "least significant digits" in certain semantic contexts is called PAN truncation. However as the whole number has a well defined structure (ISO/IEC 7812), check summing and only a limited number of options for the rest of the digits is becomes feasible to reconstruct much of the number anyway, especially as some receipts also print the card type - at least enough to sound convincing in a social engineering situation if necessary. Furthermore as described in the article faking credit card numbers because of their structure actually now becomes a method of generating data to prove identity in some cases. In summary, there are no "random digits" or "least significant digits" in a data structure with particular meanings associated with each part of that structure.

The situation gets worse when more information can be provided for the social engineering exercise: for example, in Finland it used to be common before chip and pin terminals for a shop cashier to ask for identity, where the customer would show a "valid identitiy document" (this varied by cashier, shop and day-to-day in some cases) and certain details would be written down: usually the last four (least significant aparently) digits of a Finnish social security number or a whole passport number plus other varied details depending upon the shop and phase of moon etc.

References

[1] Mat Honan. 2012 How Apple and Amazon Security Flaws Led to My Epic Hacking. Wired Aug 6, 2012

Sunday 19 August 2012

Opa Language Tutorial: Part 4

Following on from part 3, we now quickly finish everything by handling all the GET, POST, PUT and DELETE verbs for our two cases:

Verb	/expressions	/expressions/"k"
GET	return a list of expression identifiers even if empty. Return a 200 success code	return the full details of expression "k". If "k" doesn't exist then return a 404 error
POST	add the requested object if the supplied key in the object doesn't exist. Return a 201 success code as well as the key "k", otherwise return a 400 error.	not allowed, return a 400 error
PUT	not allowed, return a 400 error	modify the database with the given object if the supplied key both exists in the database and matches the key in the supplied object - return a 200 success code. In all other circumstances return a 400 error.
DELETE	not allowed, return a 400 error	delete the database entry with the given key if it exists in the database and return a 200 success code. In all other circumstances return a 400 error.

Let's go through each of the cases above in turn with minimal discussion of the code. I'm also going to tidy up the code a little, correcting the HTTP responses, making sure we're calling the correct messageSuccess and messageError functions accordingly.

For the case where we deal with urls of the form http://xxx/expressions, ie: without any key we make the matching in the second case statement in the start function:

function start(url){

  match (url)   {

       case {path: [] ... }: hello();

       case {path: ["expressions" ] ... } : expressionsRESTendpoint();

       case {path: ["expressions" | [ key | _ ] ] ...} : expressionWithKeyRESTendpoint(key);

       case {~path ...}: error();

}

}

which detects URLs without a key associated and passes these off to the expressionsRESTendpoint function:

function expressionsRESTendpoint(){
   match(HttpRequest.get_method())   {
      case{some: method}:
         match(method)         {
             case{get}:
                expressionsGet();
             case{post}:
                expressionsPost();
             case{put}:
                messageError("PUT method not allowed without a key",{bad_request});
             case{delete}:
                messageError("DELETE method not allowed without a key",{bad_request});
             default:
                messageError("Given REST Method not allowed with expressions",{bad_request});                  }
      default:
          messageError("Error in the HTTP request",{bad_request});
   }
}

which matches the GET and POST verbs, and then anything else with an HTTP 400 Bad Request, plus a natural language error message.

Because network programming suffers from the leaky abstraction problem we also need to catch the cases where we fail to get a method, in this case the default: of the outer match block handles this.

The functions expressionsPost and expressionsGet are as described earlier in part 3.

The case where a key is supplied with the URL, this is handled by the 3rd case statement in the start function and control is passed to the expressionWithKeyRESTendpoint function , which operates in the same way as the "without key" case:

function expressionWithKeyRESTendpoint(key){
   match(HttpRequest.get_method())   {
      case{some: method}:
         match(method)         {
             case{get}:
                expressionGetWithKey(key);
             case{post}:
                messageError("POST method not allowed with a key",{bad_request});
             case{put}:
                expressionPutWithKey(key);
             case{delete}:
                expressionDeleteWithKey(key);
             default:
                messageError("Given REST Method not allowed with expressions with keys",{bad_request});
         }
      default:
          messageError("Error in the HTTP request",{bad_request});
   }
}

The procedure to GET is relatively straight forward in that we check that a record with the given key exists and match the result accordingly:

function expressionGetWithKey(key){
    match(?/regexDB/expressions[key])    {
       case {none}:
          messageError("No entry for with id {key} exists",{bad_request});
       case {some: r}:
          Resource.raw_response(
             OpaSerialize.serialize(r),
             "application/json",
             {success}
          );
    }
}

Deletion is also relatively straightforward:

function expressionDeleteWithKey(key){
    match(?/regexDB/expressions[key])    {
       case {none}:
          messageError("No entry for with id {key} exists",{bad_request});
       case {some: r}:
          Db.remove(@/regexDB/expressions[key]);
          messageSuccess("{key} removed",{success});
    }
}

The expression: ?/regexDB/expressions[key]

is used to check existence, returning an option type which we handle

To remove a record from a database we use the function Db.remove which takes the record as a parameter. Note the use of the @ operator which returns the a reference path to record in the database. Opa's database functions are fairly comprehensive and are better explained in Opa's own documentation - specifically in this case section 14.7.

Now we get to the PUT case; to understand this properly we need to break this down:

First we check if the request contains a body.
If successful, we check that the supplied key exists
If successful, we deserialise the body, which should be JSON
If successful, we convert this to an Opa record and match it to the type regexExpression.
If the key supplied in this object (exprID field) matches the key used in the URL then we simply replace the record in the database in the same manner as we make with our earlier POST function.

Otherwise, there is nothing really special about this particular function, though we do use an if...else structure for the first time and this should be familiar already.

function expressionPutWithKey(key){
match(HttpRequest.get_body()) {
case {some: body}:
    match(?/regexDB/expressions[key]) {
       case {none}:
          messageError("No entry for with id {key} exists",{success});
       case {some: k}:
           match(Json.deserialize(body)) {
                case{some: jsonobject}:
                    match(OpaSerialize.Json.unserialize_unsorted(jsonobject))
                      case{some: regexExpression e}:
                         if (e.exprID == key) {
                            /regexDB/expressions[e.exprID] <- e;
                            messageSuccess("Expression with key {e.exprID} modified",{success});
                         }
                         else {
                            messageError("Attempt to update failed",{bad_request});
                         }
                      default:
                         messageError("Missing or malformed fields",{bad_request});
                    }
                 default:
                       messageError("No valid JSON in body of PUT",{bad_request});
               }
           }
   default:
      messageError("No body in PUT",{bad_request});
}

...and with that we conclude the initial development of the application or service (depending on your point of view) and shown basic database interaction, simple processing of URLs, handling of JSON serialisation/deserialisation and handling of the major HTTP methods: GET, POST, PUT and DELETE, plus some simple error handling.

The code up to this point is available on SourceForge and the file you're looking for id tutorial4.opa.

Now for a little discussion...is this the best we can do and why Opa? To answer the second question first, Opa gives us a level of abstraction in that we've not had to worry about many aspects of programming internet based services that many other frameworks have. In a later part we'll talk more about how Opa decides to distribute code between the client and server plus how to use Opa's features for distributing code for scalability and fault-tolerance. So Opa is making a number of architectural decisions for us, in part this is embedded in the nature of Opa - being a compiled language not an interpreted language. Furthermore Opa is strongly typed which means all of our typing errors (well most) are already caught at compile time. This simplifies debugging and forces a more disciplined approach to programming; there is however a caveat (or two) to this last statement.

The code written here is fully functional, written in an agile manner (which may or may not be a good thing) and also written in a rigorous manner (refinement of specification, testing etc). What is wrong with the code is that it is profoundly ugly and that comes from the style of development which in this case has been based around developing a set of individual use cases (eight of them, 2 families of access and 4 verbs).

While use case based development provides us with a good deal of the information we need to know about how our system needs to behave and in what cases, and indeed the individual use cases compose quite nicely - our program works doesn't it? - this does not result in an elegant, maintainable piece of software that performs nor satisfies our needs well. For example, if we need to change how we access the database, or even reuse functionality we end up reintroducing similar code again and often trying to find every instance of code that performs that functionality and modifying that consistently. For example, look how many times we've needed to check whether deserialisation needs to be performed and checked for, or that notice how the patterns for each of the two major use case families (with keys and without keys) are broadly similar but we have repeated code.

What we're left with here is building to be a massive amount of technical debt - once we've added code to manage sets of expressions this becomes painfully obvious; I'm not going to do that in this series, I want (need!) to rewrite this better now. Read the interviews with Ward Cunningham and Martin Fowler about this and you'll see why the code here isn't that elegant. In the next parts of this series I'll refactor the code, take more advantage of Opa's functional nature and show how architecting our design properly, worrying about separation of concerns produces more maintainable code with much higher levels of reuse.

Tuesday 14 August 2012

Opa Language Tutorial: Part 3

Continuing on from part 2...

I said last time the one should program defensively and say in spec. Two things I wasn't doing so let's look at some mechanisms for doing this. We'll concentrate first on our specification for POST:

Verb	/expressions	/expressions/"k"
POST	add the requested object if the supplied key in the object doesn't exist. Return a 201 success code as well as the key "k", otherwise return a 400 error.	not allowed, return a 400 error

Note that we actually have two cases to cater for, one with just the path /expressions and one with /expressions/"k" where "k" is some key. Opa's pattern matching helps greatly here and makes clear the distinction between the two cases which require different kinds of processing. Let's modify our dispatch function start()'s pattern matching:

match (url)

{

       case {path: [] ... }: hello();

       case {path: ["expressions" ] ... } : expressionsRESTendpoint();

       case {~path ...}: error();

}

Now we just match a path that contains the single element /expressions and call a function expressionRESTendpoint(), this time without any parameters - we've captured none and ignoring everything else. As a test:

ian@U11-VirtualBox:~/opatutorial$ curl -X POST -T "regex1" http://127.0.0.1:8080/expressions

ian@U11-VirtualBox:~/opatutorial$ curl -X POST -T "regex1" http://127.0.0.1:8080/expressions/fred

The first command above matches and if we examine the output on the terminal from our executable we'll see the succeessful record added output from the debugging statements. The second command above with the longer path does not match and ends up returning the output of the error() function. Excellent this is what we want.

Expanding this more, let's write some skeleton code for the case where we do want to catch a key. Modify the start() function's match-case statements to:

match (url) {

       case {path: [] ... }: hello();

       case {path: ["expressions" ] ... } : expressionsRESTendpoint();

       case {path: ["expressions" | [ key | _ ] ] ...} : expressionWithKeyRESTendpoint(key);

       case {~path ...}: error(); 

}

and add the skeleton code for our new handler function:

function expressionWithKeyRESTendpoint(key) {

    Debug.warning("expression with key rest endpoint {key}");

    Resource.raw_status({bad_request});

}

It is worth now explaining a little about lists and how functional programming languages present them:

[] is an empty list
[ 1 ] is a list containing a single element
[ 1, 2, 3 ] is a list containing 3 elements

However, internally lists are recursive structures and are usually treated such that we have a head element and tail elements (If you've programmed in List, ML, Haskell etc then this already familiar): The list [ 1, 2, 3 ] is actually [ 1 | [2 | [3] ]]

Lists always contain a head and a tail. Given the list [1,2,3] the head is "1" and the tail is the list [2,3]. What is the head of the tail of the list [1,2,3]? "2", because the tail of [1,2,3] is [2,3] and the head of [2,3] is "2". This practice is a fantastically powerful way of thinking about, constructing and working with lists. I recommend a good book about functional programming [2] (or even the one I contributed to [1] <- that's a citation, not a list :-)

So what does that pattern we wrote mean?

Match against a list that, firstly has the head "expressions" and a tail. Note how this is already different from the earlier case where we just matched if the list had a head "expressions".
The tail of the list must have a head which we bind to the variable "key"...
...and may have a tail, which we ignore with the underscore "_" operator.

This still isn't precisely to specification as we shall see, but for the moment it works quite well and if we test it (test often!) as before, ie:

ian@U11-VirtualBox:~/opatutorial$ curl -X POST -T "regex1" 
http://127.0.0.1:8080/expressions

ian@U11-VirtualBox:~/opatutorial$ curl 
-X POST -T "regex1" http://127.0.0.1:8080/expressions/fred

and refer to the debug output. The first command inserts a record into our database and the latter two call our new handler function. Note the debug output for this latter two:

[Opa] Server dispatch Decoded URL to /expressions/fred

[Opa] Debug expression with key rest endpoint fred

[Opa] Server dispatch Decoded URL to /expressions/fred/bloggs

[Opa] Debug expression with key rest endpoint fred

The dispatcher is decoding the whole URL, the debug statement however is returning only the second value and nothing more as described in our pattern matching statement. Returning to why this isn't to spec; we should probably return an error if the path is too long - we haven't specified what happens in this case, again, defensive programming which I'm going to ignore for the moment as the above works just fine. Personally, I'd not deploy to production (or even beta) this until this is fixed.

So that tidies up the handling of the /expressions cases and Opa's pattern matching handles the URL/URI quite naturally. So onto the next part which is some better error handling.

What we'll do here is add a few functions to report back better error messages using JSON (we made this architectural choice earlier) and look a little at records, strong typing and JSON serialisation at the same time.

The two functions for error reporting are quite simple:

function messageSuccess(m,c){
    Resource.raw_response(
      OpaSerialize.serialize({ success:m }),"application/json", c )
}

function messageError(m,c){
    Resource.raw_response(
      OpaSerialize.serialize({ error:m }),"application/json", c )
}

Both functions take a string and an HTTP error code as parameters. Opa infers the types of these based upon how these are used - in this case in the function Resource.raw_response. Working backwards, the last parameter "c" is the http response code, which despite the naming of the functions can be any valid http response code. We could add some code to check whether the usage of the function is semantically correct based on the natural language definition of "error" or "success" but that's probably overkill somewhat (at least here anyway). The second parameter is a string which contains a description of the mimetype of the response - this could be anything but being well behaved we'll write "application/json".

The first parameter is interesting in that we require a string for the body of the response. We write:

OpaSerialize.serialize({ error:m })

which firstly generates an Opa record with a single field "error" and value, whatever was placed in the parameter "m". Actually the type of m could be anything that is valid as the type of a value of a record with a constrain as we shall see.

To call these new functions we'll update our expressionsPost() function to call these as necessary:

function expressionsPost(){
match(HttpRequest.get_body()){
case{some: body}:
    match(Json.deserialize(body)){
       case{some: jsonobject}:
          match(OpaSerialize.Json.unserialize_unsorted(jsonobject)){
             case{some: regexExpression e}:
                /regexDB/expressions[e.exprID] <- e;
                messageSuccess("{e.exprID}",{created});
             default:
                messageError("Missing or malformed fields",{bad_request});
          }
       default:
          messageError("Failed to deserialised the JSON",{bad_request});
    }
default:
     messageError("Missing body",{bad_request});
}
}

I'll admit this is a bit of a nightmare to read, but the basic structure is simply:

Is there a body present in the request, if so...
Attempt to deserialise the body into JSON, and if this works:
Attempt to map this into an Opa record of type regexExpression

Aside: There is a much more elegant way of writing this - at least to some, I'll write about that in a later edition.

Aside: I corrected the above code slightly...silly error on my part which only showed itself at runtime in some obscure situations...that's what exhaustive testing is for.

This record is passed into the function OpaSerialize.serialize which takes a structure, such as a populated record and serialises this as JSON. If test the code now we see a response (certain fields removed for readability):

$ curl -i -X POST -T "jsonfiles/regex1" --noproxy 127.0.0.1 http://127.0.0.1:8080/expressions
HTTP/1.1 100 Continue

HTTP/1.1 201 Created
Content-Type: application/json; charset=utf-8

{"success":"abc"}

Note the perfectly formed JSON body, mimetype and response code.

Finally we'll write the code to process the get statements. We modify the code in expressionsRESTendpoint() as such (and also call our new standardised error handling functions):

function expressionsRESTendpoint(){
   match(HttpRequest.get_method())   {
      case{some: method}:
         match(method)         {
             case{get}:
                expressionsGet();
             case{post}:
                expressionsPost();
             case{put}:
                messageError("PUT method not allowed without a key",{method_not_allowed});
             case{delete}:
                messageError("DELETE method not allowed without a key",{method_not_allowed});
             default:
                messageError("Given REST Method not allowed with expressions",                           {method_not_allowed});                  }
      default:
          Resource.raw_status({bad_request});
   }
}

The first thing our new function has to do is query the database for its entries and then return these as a list inside a JSON object. We actually designed the database such that it already contains the keys as strings and if we recall how we entered records into the database, the exprID field was also used as the key. So we need to return a list of exprID fields from the database as a JSON object:

function expressionsGet() {
   collection = List.map(
                     function(i) { i.exprID },
                     StringMap.To.val_list(/regexDB/expressions)
                     );
   Resource.raw_response(
      OpaSerialize.serialize({expressions:collection}),
      "application/json",
     {success}
     )
}

/regexDB/expressions returns the whole database (there are optimisations for this kind of operation...you don't want to return multi gigabytes of data if you can help it) and we use the higher-order map function over the database to extract the exprID field of each record.

To make life simpler we map our hashtable structure to a list of values. The function StringMap.To.val_list performs this for us.

For each entry in that list map applies an anonymous function which takes a parameter "i" of type regexExpression and returns the exprID field. How do we know the typing of this?

We stated earlier that /regexDB/expressions is a hashtable of regexExpressions (type regexExpression), we extract from this just the values in the hashtable ignoring the keys and map extracts each entry from this list of regexExpressions, ie: individual records of regexExpression type and applies a function which takes a record of type regexExpression and extracts the exprID field.

The Resource.raw_response function performs the serialisation of the record in a similar manner as made in the two error and success functions described earlier.

Aside: There's actually a nice consistency check or invariant there to make sure that all keys actually match the record being addressed by that key. I'll leave that as an exercise to reader on how to code such an invariant or check.

We can write this whole function a little more in the functional style as well by removing the local variable collection - actually a good compiler should optimise this out under suitable circumstances:

function expressionsGet()
{
   Resource.raw_response(
      OpaSerialize.serialize({expressions:
                              List.map(
                                function(i) { i.exprID },
                                StringMap.To.val_list(/regexDB/expressions) 

                                      )
                             }),
      "application/json",
     {success}  

)
}

and if we test this (omitting some details from the response) we get a JSON object with a list of expression identifiers from our database:

$ curl -i -X GET  http://127.0.0.1:8080/expressions
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-
{"expressions":["abc","zyx"]}

So that concludes the first part of the application - we've demonstrated JSON serialisation, the first set of simple POST and GET cases, handling errors and simple working with the database. In the next parts we'll develop the cases where we work with the keys and accessing specific entries in the database.

For now, there is one function we haven't mentioned to make the above complete and that's how the handle the expressions with keys, simply return an error for the moment:

function expressionWithKeyRESTendpoint(key) {
messageError("Not implemented yet",{bad_request});
}

Now while writing this I discovered a bug or two in Opa and also had some ideas about how something should work - or at least how some of this gets used in a production environment. My plea here is that with any project of this nature - Open Source projects in general - ALWAYS reports bugs and if you have some good ideas then contribute them back. That way we make the community stronger, the developers of these various open source projects get a better idea of how people are using their products and also the reassurance that these are actually being used. Which in turn leads to better software which makes us more productive.

See you in part 4...

References

[1] Richard Bosworth (1995) A Practical Course in Functional Programming Using Standard ML. Mcgraw-Hill
[2] Bruce Maclennan (1990) Functional Programming: Practice and Theory. Addison-Wesley

Monday 13 August 2012

Perseid meteor or ....

This is in the right place and at the right time (the bright star just right of center is Deneb) so might just be a Perseid meteor:

From Astronomy

Clear sky the light pollution even out here in the countryside didn't make for optimal viewing conditions and neither did the camera settings in this instance. You can however just make out a very faint trace of the Milky Way. The other option for the streak in the picture are satellites, but according to my searches the only things in the vicinity were 3 GPS satellites (14,22,24) and passing through that area though moving orthogonally to the steak was Saudisat 1A (SO-41). NOAA-13 was too far to the right of Deneb to be a candidate.