Dealing with metadata in Google Compute Engine

For my projects, I like to have a single "boot script" that runs at start-up time. That script then looks at some metadata in order to decide "what to do next".

This has some cool side-effects, but the biggest "feature" for me is that I can publish a single .tar.gz package of the application (using npm pack for example), and the boot script can rely on the instance's metadata to run any of the services needed for my app.

For those of you just looking for "how do I safely get metadata into a shell variable":

#!/bin/sh
getMetadata() {  
  # -f says "If there's a 404, just return nothing (fail)"
  # -s says "Shut up, please (silent)"
  curl -fs http://metadata/computeMetadata/v1/instance/attributes/$1 \
    -H "Metadata-Flavor: Google"
}
MYVAR=`getMetadata myvar`  

Now, onto some code.

boot.sh

This should run as the "startup script" for all machines. In my case, it's also deployed to GCS every time code hits the master branch in a Git repository.

This isn't a problem because this main boot script doesn't do anything specific with the rest of the code. It only figures out the version of the package to download and runs the boot script for the service (which lives inside the package just downloaded).

In other words, the only job of this main boot script is to download the right package. The rest of the work is done by the scripts inside that package (which are version controlled).

#!/bin/sh

# Sometimes start-up scripts don't set root's home, which can be a problem.
export HOME=/root  
cd $HOME

# The method from above.
getMetadata() {  
  curl -fs http://metadata/computeMetadata/v1/instance/attributes/$1 \
    -H "Metadata-Flavor: Google"
}

# Install Node (general requirement).
curl -sL https://deb.nodesource.com/setup_4.x | bash -  
apt-get install -y nodejs

# Grab the version from the instance metadata.
VERSION=`getMetadata version`

# If there's no version set in metadata, figure out the latest one using GCS's `ls` command.
if [ -z "$VERSION" ]; then  
  cat > version_sort.py << EOF
import sys  
from distutils.version import LooseVersion as Version  
prefix, suffix = 'gs://a-gcs-bucket/myapp-', '.tgz\n'  
lines = [line[len(prefix):-len(suffix)] for line in sys.stdin.readlines()]  
print ''.join([prefix, sorted(lines, key=Version)[-1], suffix])  
EOF  
  PACKAGE=`gsutil ls gs://a-gcs-bucket | grep -P 'myapp-\d\.\d\.\d\.tgz' | python version_sort.py`
else  
  PACKAGE="gs://a-gcs-bucket/myapp-$VERSION.tgz"
fi

# Download the package from GCS.
gsutil cp $PACKAGE .

# Install the package and remove the bundle.
npm install `basename $PACKAGE`  
rm `basename $PACKAGE`

# Figure out which service boot script to run and run it, if applicable.
SERVICE=`getMetadata service`  
if [ ! -z "$SERVICE" ]; then  
  sh node_modules/myapp/scripts/$SERVICE-boot.sh
fi  

This script should be set to run at start-up time, ideally using the startup-script-url metadata parameter.

[service]-boot.sh

This script is packaged with the app and should "start up" your specific service. In this case, we're calling the service api, so the file is named api-boot.sh.

#!/bin/sh

# Install nginx
apt-get install -y nginx

# Update nginx configuration
service nginx stop  
rm /etc/nginx/sites-enabled/*  
cp node_modules/myapp/config/nginx.conf /etc/nginx/sites-available/api.myapp.com  
ln -s /etc/nginx/sites-available/api.myapp.com /etc/nginx/sites-enabled/api.myapp.com  
service nginx start

# Install pm2
npm install -g pm2@latest

# Run the app
export NODE_ENV=production  
pm2 start node_modules/myapp/api/app.js --name "api.myapp.com"  

To make this all work, just start your machine with metadata for:

  • startup-script-url: gs://your-bucket/boot.sh
  • service: api
  • version: 0.0.5 (optional, but fill in with a version uploaded to your bucket)

For more details on metadata and startup scripts, check out Google's official documentation:

Copyright in Jamaica

DISCLAIMER: I’m not a lawyer. Get your legal advice from someone smarter. If you're interested in this topic, pick up a copy of Cyber Law in Jamaica.

TL;DR

The Paymaster v GKRS case should be a reminder to all of us that being explicit about ownership and confidentiality is important. This means we should be really careful in a few situations:

  1. If you are sharing confidential information (source code, really specific strategy docs, etc) have your attorney draft an NDA (non-disclosure agreement).

  2. If you are writing software or having software written for you (or writing software for your start-up), have your attorney draft an IPIAA (intellectual property and invention assignment agreement) outlining exactly who owns what.

The longer version

Recently, the Court of Appeal ruled on a case involving Paymaster and GraceKennedy Remittance Services which had some pretty interesting implications related to confidentiality and intellectual property ownership. This is even more interesting to me because the IP in question was actually software, so it means that we in the tech industry should see this as a friendly reminder of what we should be doing already.

But before I get into specifics, I think it's worthwhile to quickly recap this 'work-for-hire' thing.

What is a 'work-for-hire'?

Put really simply: in the US, there's a law somewhere that says "if I pay you to produce some work, I own it, not you." There are all sorts of caveats on the type of work it can be, what the working arrangement is (employee versus contractor), and some common law tests that courts use when it's unclear who owns the work produced, but the underlying implication is actually pretty important:

If there's no contract specifying who owns what, ownership is subject to debate.

(If you want to know more, Wikipedia's page on Work for hire is a good place to start.)

Does Jamaica have a work-for-hire concept?

No. The Copyright Act is pretty explicit in saying that if there is no contract, the author owns the work. End of story.

So what happened in Paymaster v GKRS?

There were two topics at issue, which I'll address separately:

  1. Confidentiality ("Duty of Confidence"), and
  2. Intellectual property ownership ("Copyright").

Let's start with the Confidentiality piece. Here's my understanding of the situation:

  • Paymaster shared some super confidential stuff with GKRS.
  • The intent of the two was to start some sort of business arrangement.
  • There was no written confidentiality agreement.
  • GKRS had the burden of proving that there was no duty of confidence.
  • The court said GKRS carries a heavy burden if they seek to repel the contention that they were bound by an obligation of confidence. (See paragraphs 187-188 in the appeal ruling.)
  • In short, the court decided that GKRS failed to show that they didn't use the business plans of Paymaster to gain "unfair advantage".

There's nothing all that new here, but this should serve as a friendly reminder that we should be doing a few things (if we aren't already):

  1. If you are going to share super confidential stuff, have something in writing that says "we both agree to keep all of this between us and not use it unfairly".
  2. If you think the company you're sharing stuff with would use it to gain an unfair advantage, don't fucking share it.

NOTE: I still think if you're sharing "just an idea", don't bother with paperwork (see http://www.geewax.org/protecting-your-idea/ for my thoughts on this).

And what about the Intellectual property ownership piece? Here's my understanding of the situation:

  • Paymaster hired Lowe to work on software for remittances.
  • Paymaster was paying Lowe for his work.
  • There was no written agreement saying who owned what.
  • Lowe took that software with him, left Paymaster, and later licensed it to GKRS.
  • The court said that Lowe owned the copyright and was perfectly within his rights to do with the software what he wanted.

So... what does this mean exactly? This case reaffirms something really important (and different from the US):

In Jamaica, if you have no contract in place saying who owns the software, the author owns it.

This also means we should be careful when creating intellectual property:

  • If you're hiring someone to write software, get a written agreement saying who owns what.
  • If you've been hired to write software, get a written agreement saying who owns what.
  • If you are starting a company and plan to write software, you fall under both points above. Get an agreement saying your company owns the software.
  • If you deviate from the original "scope of work" specified in your written agreement (either as the employer or the author), talk about it beforehand and update the agreement if necessary.

Here's more stuff to read...

Net neutrality in Jamaica

I'm not sure if you've been keeping up on the net-neutrality debates happening in the US, but I wanted to take a moment to propose a less severe version of this for Jamaica based on an article I read about Digicel blocking certain services they deem to be competitive.

First, some background.

The net-neutrality debates in the US are focused on whether or not ISPs should be allowed to expose "internet fast lanes" for a surcharge. The debate centers around the issue of smaller companies (ie, startups) not being able to afford those faster speeds, and therefore being at a competitive disadvantage.

For example...

If Big Company had a video sharing website that was terrible but they paid the extra $100m USD per year to keep it fast, even a better video sharing website (think YouTube when it was starting) wouldn't be able to compete because it would be considered too slow compared to Big Company's site.

So where is the debate?

The debate I think we should be having in Jamaica is slightly less extreme than the one happening in the US.

I think we should consider drafting legislation similar to the Netherlands’ Telecommunications Act that says that ISPs cannot deliberately throttle traffic based on the content or purpose of that traffic (barring a few exceptions including throttling for congestion, integrity and security of the network, spam, or legal reasons).

If Digicel, LIME, or Flow decide to add a "faster lane" of traffic, we should debate the legality of charging for that later. For now, I'm saying no one should be slowed down.

Why does this even matter?

If Jamaica is serious about relying on innovation and investment in technology to level the playing field when competing on a global market, we cannot be at the whim of an Irish Telecom's top line revenue or quarterly targets.

It would be terrible to see a Digicel executive decide, in an attempt to meet revenue targets for the quarter, say "Let’s block WhatsApp so that people have to spend a bit more on text messages this month", or "Let’s charge WhatsApp a fee to allow their app to be used on our network".

Investment is already risky enough, and without any protections that Jamaican software companies won't be arbitrarily cut-off (or worse: held hostage) by an ISP, it isn't reasonable to continue investing in technology in the country.

Put simply: I won't be able to reasonably invest in tech companies targeting Jamaica.

  • As an investor in 10 Pound Pledge, I have to face the very real possibility of losing Caribbean customers based on an ISP not willing to deliver high-quality video content.

  • As an investor in EduFocal, I have to face the very real possibility that educational videos won't be delivered to customers without a hefty "transfer fee" (to be decided at the whim of the ISP).

  • As an investor in Blaze, I have to face the very real possibility that Digicel will build their own mobile money solution, and Blaze (being now found competitive to Digicel) will find themselves blocked from Digicel subscribers.

So what now?

I hope that Jamaican regulators and policymakers understand the implications of what they are allowing their telecom providers to do -- and the unintended and potentially disastrous consequences.

As an investor, I can certainly say that this move will make Jamaican-focused tech companies less competitive and less attractive for investment. It seems quite hypocritical for this to happen at a time when the government and the IMF are telling the Jamaican people that their main focus is on growing the economy.

Similarly, I hope that Jamaicans take a stand and tell their representatives that they won't tolerate discrimination of their data; that blocking traffic an ISP feels is "competitive" is unacceptable; that without treating all data the same, true innovation will be stifled just when we're starting to see great progress in Jamaica.

Safe deployment on App Engine

In a previous article I wrote about the need to update your Datastore indexes on App Engine before deploying code that relies on them. To simplify the deployment process, I put together a script that waits for your indexes to be built before deploying new code.

#!/usr/bin/env python

# Need to import and fix the sys path before importing other things.
import remote_api_shell  
remote_api_shell.fix_sys_path()

import time

from google.appengine.api import datastore_admin  
from google.appengine.ext.remote_api import remote_api_stub  
from google.appengine.tools import appengine_rpc


APP_ID = 'your-app-id'


def configure_remote_api():  
  def auth_func():
    return ('your.email@gmail.com', 'your.application.specific.password')

  remote_api_stub.ConfigureRemoteApi(
      APP_ID, '/_ah/remote_api', auth_func,
      servername='%s.appspot.com' % APP_ID,
      save_cookies=True, secure=False,
      rpc_server_factory=appengine_rpc.HttpRpcServer)
  remote_api_stub.MaybeInvokeAuthentication()


def main():  
  print 'Checking indexes...'

  configure_remote_api()

  interval = 10  # seconds.
  building = True

  while building:
    # Start with a fresh run: maybe we're done building?
    building = False

    for index in datastore_admin.GetIndices('s~%s' % APP_ID):
      # If any single index is building, we're not done.
      # Sleep for a bit and then this cycle should repeat.
      if not index.has_state() or index.state() != index.READ_WRITE:
        building = True
        print 'Indexes are still building... Waiting %s seconds.' % interval
        time.sleep(interval)
        break

  print 'All indexes are up to date.'


if __name__ == '__main__':  
  main()

Since this script will “block” until all indexes are built and ready, you can add this line to your deployment script, which should look something like...

$ appcfg.py update_indexes .
$ ./wait_for_indexes.py  # (The script above...)
$ appcfg.py update .

Protecting your idea

TL;DR: Requiring someone to sign an NDA just to hear your idea is unreasonable.

Every so often, I get an e-mail asking me for feedback on an idea, with a generic NDA attached that I'm asked to sign before I can hear what the idea is... I won't do that. And I'll let others much more experienced than I am explain why:

(The short version of each these is basically: "I can't and won't sign an NDA to hear your pitch or see your business plan.")

I’m happy to give feedback (who knows if it’ll even be useful to you), however the points in Carol Roth's article summarize the theme very well:

  1. Your idea already exists (or being new can be a problem)
  2. The value is not in the idea
  3. Sharing ideas makes them better

If you’re going to share anything generally considered “secret sauce” (like access to your code repository or the special formula for the Red Bull competitor you’re starting), asking for an NDA is a reasonable request -- you have to protect your property (however, a first e-mail typically wouldn’t be the place to discuss sharing that level of detail though).

If you're just sharing the high level idea, see it for what it is: something that hasn’t materialized yet and will likely change a lot as you work more on it.

Python dictionary comprehensions

I’m a little embarrassed to say that I haven’t yet used these in my own code -- they seem so obvious now that I look back on it.

>>> d = {'a': 1, 'b': 2', 'c': 3}
>>> keys = ['a', 'c']
>>> { k: d[k] for k in keys }
{'a': 1, 'c': 3}