Thoughts

On the Theory of Deploy

March 12, 2014

Just some loosely related thoughts about the producing software.

Unavoidable

Regardless of whether you’re Agile or doing some standard Waterfall practice, you inevitably do the following things:

Determine what the application should do
Write code
Deploy the program
Check that the program does what it should do.

These areas are absolutely necessary. Even if you think you skip the fourth step, the end user of your application will do it anyway. Because to check that the program does what it should do you obviously have to make it do that.

Artifacts

What we’re working with when developing the software application is codebase.

This codebase is expected to be run by some runtime. Runtime is a general concept here: it can be either software, in case of Java VM, or hardware, in case of compiled C++ code.

When runtime executes the codebase, it can access other programs (“applications”) or some OS facilities like file system or network sockets or I/O channels for human/machine interaction. Together, all this 3rd party stuff is the environment inside which the codebase is being executed, runtime being the part of environment.

Bare codebase cannot be executed in any arbitrary environment. It either:

Is written for some specific environment and completely useless for other.
Has parts by changing which we can accommodate to different environments.

These parts we call config. Act of changing config we call configuration.

If config is not explicitly defined as constants or some sort of the structured file holding such a constants, it doesn’t really matter, as the seams for changes required to migrate between environments are still there, albeit hidden.

We usually have the following types of configs:

Connection strings for reaching some services
Credentials for getting access to some services
Properties which needed to accommodate to some restrictions or specifics of environment.

Final software product is a codebase after configuration being executed by a runtime inside an environment.

Database contents are environment not code.

Migrations are a config.

Modes of execution

Only two modes of execution of a product are really meaningful:

Production mode
Debug mode

We prefer to use the product in production mode. We also prefer to test the product in production mode. We prefer to get diagnostics in debug mode. We also prefer to actually develop the product in debug mode.

It perfectly aligns with, for example, Debian tradition of packaging software into “just” packages and -dgb packages, which have binaries with debug symbols being not stripped out. It also has -dev packages and -src packages, but that’s not important, as they are on different levels of abstraction from our current standpoint.

No reason to invent other modes of execution.

Production mode is characterized by optimisation for speed of execution and security for both the user data and the product consistency.

Debug mode is characterized by strongest possible bias for transparency in all actions done by the application. All logging, assertion, error messaging facilities available should be open. If it’s applicable, application should even unpack it’s own source code in this mode.

Testing

Quite easy to see that the target of all application-level tests is the product, not the codebase (and it’s dumb to think about testing config). As a side note, we usually do not test the environment specifically, as the whole purpose of the environment is that we rely on it.

What does the lowest level unit tests do?

Unit tests take small slices of the codebase and treat them as microscopic product inside the minimal environment consisting only of runtime.

This property of unit tests does two things:

Greatly reduces the complexity of testing, as we almost never need to configure the codebase under test.
Greatly reduce the usefulness of test in question, because we check not the feature set of the product, but the feature set of its small slice.

Goal of the testing was already mentioned before: to make sure that the product really provides value for its user. Value for user is determined by the feature set of the product, which written listing we call specification for the product.

Given the specification, one can check whether the product really provides the features for user.

This process is called testing.

Agent performing the testing can be either human, in which case we’re talking about manual testing, or some other program, in which case we’re talking about automated testing.

When we’re doing the manual testing, specification can be written in any language. When we’re doing the automated testing, specification should be written in a language parsable by the automated tester. Even if we’re doing the automated testing, we usually still need the textual description of the feature set of the product. Thus, it’s rational to have the specification readable by both human and the automated tester.

Here we infer the natural necessity in the languages like Gherkin or Concordion.

Purity

A codebase consists of two parts mainly:

One which does contact with the environment
One which does not contact with the environment

Parts of the codebase which does not contact with the environment can be called pure.

By definition you can cover by unit tests only pure part of the codebase.

Some languages force you to explicitly split your codebase to pure and environment-dependent parts, like Haskell.

Unit tests treat code as a product. They should be written like set of examples how to use the code they are testing. This encourages usage of domain-specific languages to reduce duplication in tests setup and teardown. This, in turn, encourages usage of domain-specific languages in the production code, to maintain the same level of abstraction in unit tests and the general readability.

Here we infer the natural necessity in the DSL.

Deploy

The deploy is an act of transferring the codebase from whatever storage it is in to the target machine, configuring it to the environment of that machine and thus making the product available there.

This term is independent of whether we are talking about compiled languages or not.

More than that, with compiled languages the compilation step is neither a deploy nor a configuration. Compilation of a codebase is just transforming it to the form understandable by the natural runtime of the machine code: the microprocessor itself, with the OS supporting it.

We can safely skip the compilation if we can afford running the product by the runtime of the interpreter, be it JIT-compiling or line-by-line.

For compiled languages, the natural notion of deploy is the “installation” of the software into OS. The act of configuration of compiled codebase according to new environment is performed by the installer program.

For the scripting languages in the Web application development domain, there’s no step of “installation” usually. We just copy the codebase verbatim to the target machine, change manually several lines in script files dedicated to holding the config and consider it done accommodating to the new environment.

Configurator

Well-known problem called “it was working on my machine” raises because of ignorance of the fact that:

There are environments other than your workstation
Your codebase depends on the **environment

Latter being a lot more significant than the former.

The config is just the parts of your codebase, and so it’s just plain text.

Therefore, it is suggested that the following installation script will suffice for any codebase out there, no matter what programming language it is in and what language it is written in.

Take the codebase and the config as input.
config is the listing of the commands.
A command either tells installer to change some text token in some file from codebase
Or tells installer to rename/copy/move some file from codebase to some other place.

Such a script does not even deserve to be called a “build system”.

Each config file, holding commands, corresponds to one of environments to which the product can be installed. codebase itself will hold only placeholders and possibly “example” files holding the bunches of placeholders which needs to be moved to appropriate places.

Only codebase should be pushed to the source control system. config files should be published by either more secure (in case of sensitive credentials) or less secure (in case of local development workstations) means.

Build system

It is suggested that there’s no real need in the build commands conceptually other than the following:

1. Make the documentation
   1a. Hand-crafted user-level guides.
   1b. Autogenerated API reference.
2. Perform the testing
   2a. Acceptance end-to-end testing.
   2b. Unit testing.
   2c. Any other varieties of testing like integration, performance, security, etc.
3. Static analysis
   3a. Conformance to the coding standards and best practices.
   3b. Statistics and metrics.
4. Deploy
   4a. Packaging for distribution
   4b. Delivering to target machine
   4c. Installation (reconfiguration) to target environment.