Friday, May 6, 2016

ES6 Review, part-1

This is a first post reviewing the new features of JavaScript ES6.  It is also a short review of ES6 & Beyond, by Kyle Simpson (O'Reilly, (c) 2016), which I'm using to learn most of the new stuff.  Also on GitHub.  I will rate things using chess annotations, where an exclamation mark is "good" and a question mark is "dubious".

As for my background and biases, for 20+ years I was a traditional desktop app programmer, mainly in C and Java.  I'm fairly new (~4 years) to JavaScript.  I often appreciate, but sometimes hate, the differences between it and Java.  I've used NodeJS and some of its most common modules (Express, Request), but otherwise, have used few of the mainstream JS libraries such as JQuery and Angular.


Intro

The book is moderate sized (~260 pages, trade paperback sized (6" x 9")) and meets O'Reilly's typical high standards for fonts, printing, binding etc. It's a little disappointing that they ran out of the traditional animal pictures to put on the covers.  :-)


Let's start with Chapter 2, which covers new Syntax.  There's a lot.

Block Scoping!!

Welcome to the 20th century JavaScript.  Anything that reduces the need for an IIFE hack is a plus in my book.

Let!!

Instead of var x, which is subject to the occasional bugs of hoisting, or polluting namespaces, using let x declares the variable with block scope from that point forward, much like C and Java.  Using that variable before it's declaration results in a ReferenceError from a "Temporal Dead Zone" (TDZ).  This sounds confusing and the terminology is off-putting, but the simple answer is to declare your variables before using them!  Just like almost every other language.

Unfortunately, the author goes off on a multi-page tangent describing his preferred version for let syntax, which is less conventional and differs from  C/Java tradition.  Since this alternate syntax was not adopted by ES6, the discussion is confusing and useless.  In general, I like the opinions expressed by Mr. Simpson in the book, but here was one case where he went off base.

Let with a For Statement!

A minor benefit of let over var in a loop is that let redeclares the variable each iteration, so it is "sticky" if used in a callback.  Mr. Simpson provides a precise and clear example of this benefit.

const!

Immutability has many advantages, and in ES6 declaring a variable const makes this clearer.  Note that in the case of a reference to an object, say an array, only the reference is immutable, not the contents of the array.  That limitation is unfortunate but the same as many other languages.  Mr. Simpson provides a concise and clear example.

Block Scoped Functions

Meh, o.k. If you write a lot of complex recursive stuff it's useful, but I don't see a huge need for this in typical code, and there are some incompatibility issues.

Spread!  Rest (a.k.a. "Gather")!!

Both of these use "..." in your code, which might be slightly confusing at first, but I think in practice you'll get used to it, and it mimics existing C and Java syntax, so it was a good choice.

When ... is used before an iterable (see later), such as an array, it "spreads out" the individual elements of that iterable.  Mr. Simpson provides nice examples of how this can be used to conveniently replace apply() or concat().

In alternative places, ... gathers a group of variable into an array.  The book calls this "rest" because it is gathering "the rest of the arguments".  I'd much prefer "gather" or "varargs".  Since a great place to use this syntax is to gather any variable arguments passed to a function at the end of the argument list.  e.g.

function foo(a, b, ...varargs) {}

This mimics behavior in other languages such as Python and Java.  Note that a common use case will be to rewrite code that used the deprecated "array-like" arguments array.  Instead of chanting the mantra "Array.prototype.slice.call":

function foo() {
  var args = Array.prototype.slice.call(arguments);
  // now args is a real array
}

Just do

function foo(...varargs) {
}

Note: This is not explicitly mentioned in the book, but if there are no "rest of" arguments to gather, varargs will be set to an empty array, not undefined.  IMO this is correct behavior.


Default Parameter Values!

These are superior to streams of
   x = x || 6;
in your code.

You can also call functions or invoke an IIFE in your default, but, IMO, this is probably getting too complex or cute.  The book hints at this and details several "gotchas".  But one nice use-case is to provide a default "do nothing" callback function in your function declaration, e.g.

function asyncFoo(url, callback=function(){}) {
}


Destructuring ?!

Have to say, I don't see the point in this, the book doesn't provide any "killer use cases", and it's confusing.  If anybody does have a killer use case, please let me know.  Until then, count me a skeptic.

Let's say a function returns multiple results, e.g. in an array or an object.

function get3DCoordinates() { 
  return [1,2,3]; 
}

function getLocation() {
  return {
     lat : 1;
     lon : 2;
  }
}


Before ES6, you'd go

var c3 = get3DCoordinates();
// use c3[0], c3[1], c3[2]as needed...

or

var loc = getLocation();
// use loc.lat, loc.lon as needed...


With ES6, you can go

var [x,y,z] = get3DCoordinates();
// use x, y, z instead...

or

var { lat: lat, lon:lon } = getLocation();
// use lat, lon instead...


The syntax with the variables on the left is a bit confusing, but I could get used to it.  Fundamentally, I question this usefulness of this.  When a function returns multiple values, those values usually belong together.  If they don't belong together, why is a single function returning them?  You are likely violating the Single Responsibility Principle.

What's the advantage of splitting apart things that belong together?

An another counterexample, what if you needed the location of two things?  What looks like better code to you?

var sanFran = getLocation(94101);
var seattle = getLocation(98101);
var dLat = sanFran.lat - seattle.lat;
var dLon = sanFran.lon - seattle.lon;

or

var { latSF, lonSF } = getLocation(94101);
var { latSeattle, lonSeattle} = getLocation(98101);
var dLat = larSF - latSeattle;
var dLon = lonSF - lonSeattle;

I prefer the old fashioned approach, but maybe your mind works differently.  What if you need the location of N things?

Pages 26-38 of the book cover many more complex cases of destructuring.  With, IMO, no killer use case.  For example, is

var { model : { User } } = App;

really any improvement over

var User = App.model.User;

No.  The new syntax is confusing, it's more typing (in this case), and, most importantly, it's very complex syntax in the typing.  Instead of a couple of dots, flowing left to right, in standard western 1,2,3 order, as we are all used to, you have to properly nest brackets, and the order of the fields is all mixed up - not even reversed, the order is 2,3,1!  The programmer has to think, usually a bad thing.

Until I see a good use case for destructuring, it seems like a confusing addition to the language with little usefulness.  Mr. Simpson provides an example where you can combine preferences/settings from two different objects, e.g. user-preferences and default values.  However, the example only works if you know ahead of time (and put in the code) all of the possible fields.  In my experience, preferences expand over time and come from different projects, so this would be difficult or impossible to maintain.  Without a good use case, I don't see much value in destructuring.  What do you think about it?

Next time, we will continue with even more syntax changes, starting with Object Literal Expressions.  See you then.







Thursday, March 24, 2016

Learning Google Maps

Inspired by the upcoming Race to Alaska, I put together a web page that integrates the nice JSON weather reports from the OpenWeatherMap API, plus some XML format reports from NOAA buoys.  These get drawn on a map with wind barbs.  You can scroll around to see various areas on the map, and click on a station to get more details.

It tracks the recent reports and attempts to compute an hourly rate of change for some statistics.  Depending on the time between reports this can be an inexact science.  But it helps to see if wind, pressure or temperature is changing rapidly.

The back end code is written in JavaScript using Node.js.  The front end is HTML and JavaScript and the Google Maps Javascript API.

Monday, December 29, 2014

Do Frameworks get in the way? A tale of Python and PayPal IPN.

I was writing some basic Python3 CGI code to handle PayPal IPN posts.   PayPal docs here.   The IPN message authentication protocol has PayPal first POST a message to your URL.  After sending back a quickie 200 OK in response to the POST, you then POST back the "complete, unaltered message", with "the same fields (in the same order) as the original message".  I'm not sure this really matters, as I have seen online code that doesn't seem to worry about ordering, that claims to work.  But, just to be robust, I wanted to follow the "correct" protocol.

If PayPal were sending a GET, one could use os.environ.get('QUERY_STRING').  But, for a POST, that returns None.  The Python cgi library provides a nice, standard, "handles lots of tricky cases" mechanism to read the POST fields, using cgi.FieldStorage().  However that returns a non-sorted dictionary, where the order is not preserved.  I reported this on a Stack Overflow question, and asked how one could get the data exactly as sent.  I mean, it was in a big String coming over the wire, e.g. "foo=bar&count=3",  right?  This should be simple.  HTTP can be complex, but this part isn't very tricky.

To my surprise, nobody answered, and not many people even viewed the question.  Maybe it was poorly worded.  I think the real reason might be that programmers are too used to using a library or framework, such as cgi.FieldStorage. or Django, and don't understand what's actually going on deep underneath.   Not picking on Python programmers here, I think the same is true in most languages.

After some playing around, the answer is astoundingly simple.  The POST data is coming over the wire as a String, so just read it from stdin.

query_string = sys.stdin.read()

To POST everything back to PayPal, use this simple code.  (Should I worry more about encoding?)

formData = "cmd=_notify-validate&" + query_string
req = urllib.request.Request(PAYPAL_URL, formData.encode())
req.add_header("Content-type", "application/x-www-form-urlencoded")
response = urllib.request.urlopen(req)
status = str(response.read())
if (not status == "b'VERIFIED'"):
    #complain/abort/whatever
else:
    #continue processing

There's one drawback: now you can't get cgi.FieldStorage() to work.  When it goes to read from stdin, there's nothing left, so it returns an empty dictionary.  (the Python cgi source code is here) .  So, it you also want the convenience of a dict for other purposes, such as checking on various IDs or the price they paid, you need to create your own dict.  But that is also trivial:

multiform = urllib.parse.parse_qs(query_string)

Just like cgi.FieldStorage(), this returns a dictionary where the values are lists of Strings, since it is possible for a key to be repeated in a query, e.g.  foo=bar&foo=car.  However, in practice, this is rare, and doesn't apply for the PayPal case.  I guess you could always ask for the 0th item in the list - FieldStorage has some special methods for this.  To simplify things, I created a nice, simple, single-valued form with Strings for keys and values:

form = {}
for key in multiform.keys():
    form[key] = multiform.get(key)[0]







Wednesday, November 12, 2014

Into the Clouds, deploying node.js with Modulus and OpenShift

My Agility website, www.nextq.info, is up and running on modulus.io.  I like Modulus.  It's easy to use, has been reliable, and you don't need to do a ton of heavy-duty Unix-ese command line stuff.  Their web interface does most of the work, and a simple command modulus deploy will update your codebase.  The main drawback is that they charge a small fee, $15 a month.  I haven't tried any scaling yet.

So lately I've also been playing with OpenShift.  It's free for small projects, and that even includes a little scaling.  It's definitely harder, more technical, and more "UNixy" than Modulus.  You deploy using git, and many commands must be done from the command line, not the web UI.  They have a free book to get you started, Getting Started with Openshift.  After some fiddling, I got things going.

One major issue is that Modulus and OpenShift use different environment variables for important settings like the port and ip address.  So, if you want code portable across both, you will need something like this in your node code:

function setupConfig(config) {
   if (process.env.OPENSHIFT_APP_DNS) {
      config.port = process.env.OPENSHIFT_NODEJS_PORT;
      config.ipAddress = process.env.OPENSHIFT_NODEJS_IP || '127.0.0.1';
      config.mongoURI = process.env.OPENSHIFT_MONGODB_DB_URL;
      config.isOpenshift = process.env.OPENSHIFT_APP_DNS;
   }
   else if (process.env.MODULUS_IAAS) {  // modulus
      config.port = process.env.PORT;
      config.ipAddress = '0.0.0.0';  // modulus doesn't need an ip
      config.mongoURI = process.env.MONGO_URI;
      config.isModulus = process.env.MODULUS_IAAS;
   }

   // possibly more here...
   
   return config;
}

And use these values when you create the server, i.e.

app.listen(config.port, config.ipAddress, function(){
  ...
});


I have the "isXXX" fields so that you can setup specific options like shutdown hooks.

For OpenShift you must change the package.json file to point to your main class.  OpenShift defaults to server.js, where most people use app.js.  Be sure to have the following lines in package.json with the correct name of your main file.

"scripts": {
    "start": "node app.js"
  },
"main": "app.js",

Finally, on a scaled platform, OpenShift (using the haproxy load balancer"pings" your app every two seconds, quickly filling up the log file with confusing junk.  There are even three (duplicate) bugs for this: 918783923141 and 876473.  Their suggested "fix" is to run a cron job calling rhc app-tidy once in a while to clear out your logs.  This fixes the "too much space" issue, but you still have a big problem using the log file, cause all this pings make it harder to see any real problems.  If you are brave, you could edit the haproxy.cfg file as hinted at (but not fully explained) in this StackOverflow post.  I chose an alternative.

My fix is to use Express to insert some middleware before the logger.  The "pings" can be recognized since they have no x-forwarded-for header.  Real requests should have that field, and that's also the value you want in the logfile.  At least, that works for me.

First, a function to ignore these pings and not call next().  Ever the fiddler, it is wrapped in another function so that it can still show a subset of the pings - you might want to see the pings every hour or so.

function ignoreHeartbeat(except) {
   except = except || 0;
   var count = 1;
   return function(req, res, next) {
      if (req.headers["x-forwarded-for"])
         return next();      // normal processing

      if (except > 0) {
         if (--count <= 0) {
           count = except;
           return next();
        }
      }
 
      res.end();
   }   
}

Then, in your app setup code, add this before you add the logger.  e.g. (Express 3 shown)

app.use(ignoreHeartbeat(1800));         // 1800 is once an hour
...
app.use(express.logger(myFormat));

Here's is example log data, where the ignoreHeartbeat was set to 10, so the pings should appear roughly every 20 seconds.  Note how the pings have no ip address.

Wed, 12 Nov 2014 22:08:36 GMT - - GET / 200 - 2 ms
Wed, 12 Nov 2014 22:08:56 GMT - - GET / 200 - 2 ms
Wed, 12 Nov 2014 22:08:59 GMT 50.174.189.32 - GET / 200 - 10 ms
Wed, 12 Nov 2014 22:08:59 GMT 50.174.189.32 - GET /javascripts/jquery-jvectormap-1.2.2.css 200 - 17 ms
  (more "real" GETs here...)
Wed, 12 Nov 2014 22:09:17 GMT - - GET / 200 - 3 ms
Wed, 12 Nov 2014 22:09:37 GMT - - GET / 200 - 1 ms

Monday, October 20, 2014

Web Scraping with node.js and Cheerio

I recently gave a talk at the BayNode Meetup, about my experiences web scraping for dog agility trials using node.js and the cheerio module.  The results are used for my website, www.nextq.info.

You can find the slides as Google Docs here:  Web scraping with cheerio.  Enjoy!


Wednesday, July 23, 2014

Groovy-Like XML for Java. Simple and Sane.

Parsing and navigating through XML in Java is a pain.  The org.w3c.dom.* classes are numerous, messy, and "old style", with no Collections, no Generics, no varargs.  XPath helps a lot with the navigation part, but is still a bit complex and messy.

Groovy, with XMLParser and XMLSlurper and their associated classes, makes this amazingly, dramatically easier.  Simple and Sane.  For example, Making Java Groovy Chapter 2 has an example to parse the Google geocoder XML data to retrieve latitude and longitude.  Below is the essentials of the code.  The full code, which is not much longer, is on GitHub here.

String url = 'http://maps.google.com/maps/api/geocode/xml?' + somemore...
def response = new XmlSlurper().parse(url)
stadium.latitude = response.result[0].geometry.location.lat.toDouble()
stadium.longitude = response.result[0].geometry.location.lng.toDouble()

The parsing is trivial, and navigating to the data (location.lat or location.lng) is also simple, following the familiar dot notation.

Can you do something anything like this in pure Java?  Not quite.  So I wrote a small library, xen, to mimic much of how Groovy does things.  The full Geocoder.java code is here, snippet below:

String url = BASE + URLEncoder.encode(address);
Xen response = new XenParser().parse(url);

Option 1: XPath slash style, 1 based indices
latLng[0] = response.toDouble("result[1]/geometry/location/lat");
latLng[1] = response.one("result[1]/geometry/location/lng").toDouble();

Option 2: Groovy dot style, 0 based indices
latLng[0] = response.toDouble(".result[0].geometry.location.lat");
latLng[1] = response.one(".result[0].geometry.location.lng").toDouble();


Pretty close, eh?

The main difference is that we can't use the dot notation directly from an object, but we can use a very similar slash notation based upon XPath syntax. If you use XPath notation, one major difference from Groovy is that array indices in W3C XPath are 1-based, not 0-based.  Therefore note that we access the 1st element of result, not the 0th.  However, if the "path" starts with a . and a letter, as in the final example, the path is treated as a Groovy / "dot notation" style, with 0-based indices.

So, if you want to greatly simplify parsing and navigating through XML, and/or you love how Groovy does things, please check out my (very beta!) xen library which allows you to do it in Java.  Currently it is compiled vs. Java 6 but I think it should be fine in Java 5.  So if you need to support some Android device, or can't or don't want to integrate Groovy into your Java projects, this could be very useful.

Xen library
JavaDocs
README

The README discusses various design decisions, particularly, how my design converged upon many aspects of the Groovy design.   More discussion will appear in later posts.  And, be warned, this is still a very early version, 0.0.2, so there are probably bugs, some mistakes, and upcoming API changes.


Node for Java Programmers

At a recent BayNode Meetup, I gave a 15 minute presentation on "Node for Java Programmers".  Mainly notes on common things I did wrong coming from the Java world, and ideas or idioms to deal with them.

I got some good feedback and positive responses, and recently edited the presentation.

Here is a link to it.   (on Google Docs).