Friday, December 28, 2012

From raw idea to useful source code

A couple months ago I had an Idea.

I even blogged about it: A lookup service for US National Weather Service codes. Those codes are necessary to access their machine-readable products.

In this post, I will show how I developed the idea into some code, how I grew the code into a project, added structure and version control, and finally moved the project onto online hosting.

This is not the only way to create a project.
This is probably not the best way for many projects.
It's just the way I did it, so you can avoid the most common mistakes.

You can see my final product hosted online at Launchpad.



From Idea to Code:

I know barely enough C to be able to ask there the bathroom is, so it's easier for me to use Python.

Code starts out as a single Python script:
- geolocation.py

As we add more features, a single script gets big and unwieldly, and we break it into smaller pieces.

For example, this structure easily allows more interfaces to be added.
- geolocation_service.py
- command_line_interface.py

Let's add a dbus interface, too. Dbus will semd messages to the interface if it knows about it. Let dbus know about it using a service file.
- dbus_interface.py
- dbus_service_file.service

Let's add an http interface, so not everyone in the world needs to download 5-6MB of mostly-unused lookup databases:
- http_interface.py
- specialized_webserver.py

Let's go back and formalize how we create the databases:
- database_creator.py

We have a lot of hard-coded variables in these scripts. Let's break them out into a config file.
- configfile.conf

There are other possible files, that we're not using. For example:
- Upstart config file (if we want the service to run/stop at boot or upon some system signal, lives in /etc/init)
- Udev rule file (if we want the service to run/stop when a device is plugged in, lives in /etc/udev/rules.d)

But that's a lot of files and scripts! 8 files, plus the databases.
 


Adding Version Control:

It's time to get serious about these eight files. We have invested a lot of time creating them, and it's time to start organizing the project so others can contribute, so we can track new bugs and features, and to protect all our invested work.

First, we need to introduce version control. Ideally, we would have done that from the start. But we didn't. So let's fix that.

Version control offers a lot of advantages:
    We can undo mistakes.
    It helps us package the software later.
    It helps us track bugs.
    It helps us apply patches.
    It helps us document changes.

There are plenty of good version control systems available. For this example, I'll use bazaar. The developers have a very good tutorial.

Installing bazaar:

$ sudo apt-get install bzr
$ bzr whoami "My Name "

Since we didn't start with proper version control, we need to create a new directory using version control, move our files into it, and add our files to version control.

$ bzr init-repo My_NEW_project_directory
$ bzr init My_NEW_project_directory
$ mv My_OLD_project_directory/* My_NEW_project_directory/
$ cd My_NEW_project_directory
$ bzr add *

Finally, we need to clean up the old directory, and commit the changes.

$ rm ../My_OLD_project_directory
$ bzr commit -m "Initial setup"


Organizing the code

My project directory is starting to get disorganized, with eight scripts and files, plus six database files, plus version control, plus more to come. I'm going to restructure my project folder like this:

My_project_directory
  +-- data   (all the database files)
  +-- src    (all the python scripts and other non-data files)
  +-- .bzr   (bzr's version control tracking)


Once version control is active, we cannot just move things around. We need to use the version control tools so it can keep tracking the right files.

$ bzr mkdir data src
$ bzr mv *.gz data/
$ bzr mv *.py src/
$ bzr mv dbus_service_file.service src/
$ bzr mv configfile.conf src/

See how bazaar adds the directories and performs the moves?

Now My_project_directory should be empty of files. Once reorganization is complete, remember to commit the change:

$ bzr commit -m "Reorganize the files to a better project structure"




Integrating into the system:

We have a problem with our eight files. They run beautifully, but only if they are in our home directory.

That won't work in the long run. A server should not be run as a user with shell access - that's a security hole. Nor should it be run out of a user's /home. Nor should it be run as root. Also, other applications that are looking for our server won't find it - all the files are in the wrong places.

So we need to put our files into the right places. And often that means fixing the scripts to replace hard-coded temporary paths (like '~/server/foo') with the proper locations ('/usr/lib/foo-server/foo').

Where are the right places?

The Linux Filesystem Hierarchy Standard (FHS) is used by Debian to define the right places.

Two files are directly user-launched in regular use:
- specialized_webserver.py: /usr/bin
- command_line_interface.py: /usr/bin

The database files are read-only and available to any application:
- database files: /usr/shared

Three files are launched or imported by other applications or scripts:
- geolocation_service.py: /usr/lib
- dbus_interface.py: /usr/lib
- http_interface.py: /usr/lib

One file is very rarely user-launched under unusual circumstances:
- database_creator.py

The  dbus service file will be looked for by dbus in a specific location:
- geolocation_dbus.service: /usr/share/dbus-1/services

Config files belong in /etc
- geolocation.conf: /etc

Makefiles make organization easier:

Now that we know where the right places are, let's create a Makefile that will install and uninstall the files to the right place. Our originals stay where they are - the makefile copies them during the install, and deletes the copies during uninstall.

Makefiles are really config files for the make application (included in the build-essential metapackage). Makefiles tell make which files depend upon which, which files to compile (we won't be compiling), and where the installed application files should be located, and how to remove the application.

Here is a sample makefile for my project (wbs-server):
DATADIR = $(DESTDIR)/usr/share/weather-location
LIBDIR  = $(DESTDIR)/usr/lib/wbs-server
BINDIR  = $(DESTDIR)/usr/bin
DBUSDIR = $(DESTDIR)/usr/share/dbus-1/services
CONFDIR = $(DESTDIR)/etc
CACHDIR = $(DESTDIR)/var/cache/wbs-webserver

install: 
 # Indents use TABS, not SPACES! Space indents will cause make to fail
 mkdir -p $(DATADIR)
 cp data/*.gz $(DATADIR)/

 mkdir -p $(LIBDIR)
 cp src/geolocation.py $(LIBDIR)/
 cp src/wbs_dbus_api.py $(LIBDIR)/
 cp src/wbs_http_api.py $(LIBDIR)/
 cp src/wbs_database_creator.py $(LIBDIR)/

 cp src/wbs_cli_api.py $(BINDIR)/
 cp src/wbs_webserver.py $(BINDIR)/
 cp src/wbs-server.service $(DBUSDIR)/
 cp src/confile.conf $(CONFDIR)/wbs-server.conf
 mkdir -p $(CACHDIR)

uninstall:
 rm -rf $(DATADIR)
 rm -rf $(LIBDIR)

 rm -f $(BINDIR)/wbs_cli_api.py
 rm -f $(BINDIR)/wbs_webserver.py
 rm -f $(DBUSDIR)/wbs-server.service
 rm -f $(CONFDIR)/wbs-server.conf
 rm -rf $(CACHDIR)

Let's save the makefile as Makefile, and run it using sudo make install and sudo make uninstall.

We run a test:

$ sudo make install
$ /usr/bin/wbs_cli_api.py zipcode 43210
bash: /usr/lib/wbs-server/wbs_cli_api.py: Permission denied

Uh-oh. Let's investigate:

$ ls -l /usr/lib/wbs-server/wbs_cli_api.py 
-rw-r--r-- 1 root root 3287 Dec 23 20:46 /usr/lib/wbs-server/wbs_cli_api.py

Aha. Permissions are correct, but the executable flag is not set. Let's uninstall the application so we can fix the makefile.

$ sudo make uninstall

In the makefile, we can make a few changes if we wish. We can set the executable flag. We can also create links or symlinks, or rename the copy.

For example, wbs_cli_api.py is a rather obtuse name for a command-line executable. Instead of copying it to /usr/bin, let's copy it to /usr/lib with its fellow scripts, make it executable, and create a symlink to /usr/bin with a better name like 'weather-lookup'

install:
        ...
 cp src/wbs_cli_api.py $(LIBDIR)/
 chmod +x $(LIBDIR)/wbs_cli_api.py
 ln -s $(LIBDIR)/wbs_cli_api.py $(BINDIR)/weather-lookup
        ...

uninstall:
        ...
 rm -f $(BINDIR)/weather-lookup
        ...


Another example: It's a bad idea to run a webserver as root. So let's add a couple lines to the makefile to create (and delete) a separate system user to run the webserver.

USERNAM = wbserver
        ...
install:
        ...
 adduser --system --group --no-create-home --shell /bin/false $(USERNAM)
 cp chgrp $(USERNAM) $(LIBDIR)/*
 cp chgrp $(USERNAM) $(CACHDIR)
 # Launch the webserver using the command 'sudo -u wbserver wbs-server'
        ...

uninstall:
        ...
 deluser --system --quiet $(USERNAM)
        ...




Sharing the code

Now we have a complete application, ready to distribute.

Thanks to the Makefile, we include a way to install and uninstall.

It's not a package yet. It's not even a source package yet. It's just source code and an install/uninstall script.

We can add a README file, a brief description of how to install and use the application.

We can also add an INSTALL file, detailed instructions on how to unpack (if necessary) and install the application.

It would be very very wise to add a copyright and/or license file, so other people know how they can distribute the code.

After all that, remember to add those files to version control! And to finally commit the changes:

bzr commit -m "Initial code upload. Add README, INSTALL, copyright, and license files."

Finally, we need a place to host the code online. Since I already have a Launchpad account and use bzr, I can easily create a new project on bzr.

And then uploading the version-controlled files is as simple as:

bzr launchpad-login my-launchpad-name
bzr push lp:~my-launchpad-name/my-project/trunk 

You can see my online code hosted at Launchpad.


Next time, we'll get into how to package this source code.

Sunday, December 16, 2012

Very Simple Database in Python

Experimenting with big lookup tables for my weather code lookup server. Instead of using a big configparse file, I want to try a small database.

Python's dbm bindings are included in the default install of Ubuntu. It's light and easy to use.

#!/usr/bin/env python3
import dbm.gnu                # python3-gdbm package
zipcodes = '/tmp/testdb'

# Create a new database with one entry
# Schema: Key is Zipcode
# Value is Observation_Station_Code, Radar_Station_Code, Forecast_Zone
zipc = dbm.gnu.open(zipcodes, 'c')
zipc['53207'] = b'kmke,mkx,wiz066'

# Close and reopen the database
zipc.close()
zipd = dbm.gnu.open(zipcodes, 'r')

# List of database keys
keys = zipd.keys()

# Retrieve and print one entry
print(zipd[keys[0]].decode('utf-8').split(','))

zipd.close()

It works very well and is very fast. It's not easy to view or edit the database with other applications, since it is binary (not text).