Python for Object Based Image Analysis (OBIA)

Satellite image segmentation and classification in Python

Posted by Carlos De La Torre 1 year, 11 months ago Comments


A few weeks ago I posted about the benefits and power of the Python ecosystem for geospatial data processing. As an example, we analyzed a typical task in this field (satellite image classification) with an straightforward, machine-learning approach.

In this case, we will go further than that.

The projects that we (in Machinalis) work on a daily basis, applying machine-learning related methods and algorithms, require more than straightforward, out-of-the-box, approaches. But even with a higher order of sophistication and complexity, Python is still a perfect choice.

Obviously any production-ready, machine-learning based, image processing system require a whole lot of work on experimentation, hyperparameter’s tuning, optimization and stuff like that. We are not going to cover those issues in this post.

Still, we will somehow expand the aforementioned post. We are going to perform a new classification but this time based on a different paradigm: Geographic Object-Based Image Analysis.

OBIA: Object-Based Image Analysis

OBIA is a relatively new and evolving paradigm that builds on image analysis concepts that have been used for decades: segmentation, edge-detection, feature extraction and classification.

Basically, an OBIA approach for image classification would be

  1. Perform an image segmentation
  2. Classify the segments (instead of the pixels)

There already are many studies demonstrating that this technique can yield better results than more traditional, pixel-based, classification methods (see [1]).

Although the more obvious and clear advantages of the OBIA approach make sense with high resolution imagery, there are solid studies and evidence where this approach has been used with images of medium or coarse resolution. For example this paper [2] evaluates OBIA Classification of crops with images of 30 m and 15 m pixel resolution (somehow similar to the Landsat data).

But it is not in the scope of this post to discuss any further this concept or paradigm. There’s a lot of very interesting and specific literature about it (for example [3]).

Having said all this, let’s focus on the main objective of this post.

Python implementation of an object-based image classification

Code and data

First of all, the code related to this post can be found here:

The code shown in this post has been simplified. In the repo you’ll see a more complete and complex version.

For the Python enthusiasts, there’s a Jupyter Notebook available in the repo.

If you need sample data to play with, you can use the same as in the other blog post and you can download it from here.


As described before, the strategy that we are going to describe has got two stages:

First, an image segmentation is performed to cluster contiguous areas of similar pixels. Each resulting segment is modeled aggregating the information of its pixels. So here we see a change in scale: from pixel level we move to a segment, cluster or object level.
Once the training dataset is defined (a subset of segments) a classifier is trained and used to classify all the segments.

The technological stack is similar to the one presented in the last post, so we are not going to expand on that. Although we will mention some new requirements.

Also, not all the code will be presented here. Only some specific parts of it. As mentioned before, the code can be found here:

In particular, we will skip describing the first part of the code, including reading the input data, since it is similar to what we did in the last post.


Segmentation of digital images using computers started over 50 years ago. By now, there are several techniques or methods that can be used (thresholding, clustering, edge detection, graph partitioning, etc.). What’s more, there are many efficient implementations in many programming languages.

In Python, many well known packages can be used for this: scikit-learn, OpenCV, scikit-image

As usual, the selection of the algorithm together with the fine tuning of its parameters is a complex task. So many variables must be taken into account that it goes far beyond the scope of this post. Right now, we are just taking a general, high-level look at the possibilities and features provided by the tools of the Python ecosystem.

In the course of a real project, the task of deciding the segmentation algorithm to use and its final configuration would take a considerable amount of work. Furthermore, thinking in a long term, automated system, we would probably need to develop support infrastructure to permanently monitor, optimize and update the configuration parameters.

So, for the sake of simplicity, we are just going to make a pseudo-arbitrary choice.


The scikit-image project includes the skimage.segmentation module which provides several useful algorithms: active_contour, Felsenszwalb, quickshift, random walker, slic.

Most of them support multi-band data. They call it multichannel 2D images. So for example, to use the Quickshift algorithm, we just do:

img = exposure.rescale_intensity(bands_data)
clusters = quickshift(img, kernel_size=7, max_dist=3, ratio=0.35,
segment_ids = np.unique(clusters)

A couple of details about scikit-image

  • the segmentation algorithms expect the image data to have values between 0.0 and 1.0. That’s why we use the rescale_intensity function in the exposure module.
  • Typically, multichannel images are assumed to be RGB. To avoid problems with that, the convert2lab=False parameter is used.
  • in line with scikit-learn, scikit-image works with Numpy arrays. So the obtained segments can be easily processed later, without needing to learn nothing new.

At this point, we have a segmented image: each pixel value is an integer number with a segment label.

Different segmentation results

This figure shows the input image next to the results of different segmentation algorithms: quickshift and Felsenszwalb

The code in the repo is actually using the Felsenszwalb segmentation results.


Next we have to perform a supervised classification. Before we can do that, we have to do the following:

  1. Feature extraction: choose a model to represent our segments. A set of features that describe a segment’s homogeneity and, at the same time, distinguishes different segments.
  2. Training dataset definition: determine which segments must be used to train the classifier.

Feature extraction

In order to classify segments instead of pixels we need to provide a segment representation.

Right now, a segment is defined by a set of pixels. A pixel is modeled by the 7-dimensional vector of radiometric data (one dimension per band).

Again, there’s a lot of work to do here and we are just going to use a very simple statistical model: with all the pixels of a given segment, we compute certain statistics for each band and use that as features:

def segment_features(segment_pixels):
    For each band, compute: min, max, mean, variance, skewness, kurtosis.
    features = []
    _, n_bands = segment_pixels.shape
    for b in range(n_bands):
        stats = scipy.stats.describe(segment_pixels[:,b])
        band_stats = list(stats.minmax) + list(stats)[2:]
        features += band_stats
    return features

Now a segment is defined by a vector of constant length 42 (7 bands x 6 statistics). So the representation of the segment is independent of its size. The gain is potentially very big: a segment with only 100 pixels used to be represented by 700 values (100 pixels x 7 bands).

The next step is to transform our input data, from a pixel representation (the full image) to objects (or segments).

objects = []
objects_ids = []
for segment_label in segment_ids:
    segment_pixels = img[segments==segment_label]
    segment_model = segment_features(segment_pixels)
    # Keep a reference to the segment label

In the objects variable we have each segment represented by its features’ vector, and no more by its pixels (to each vector we added a dimension to keep track of the segment’s number or id).

Training dataset definition

Now that we have a valid, concise representation of the segments, we need to define which of them are going to be used for training.

From the sample data we get labeled pixels. Then we have to match those pixels with the corresponding segments. There could be ambiguities to solve: samples of different classes could end up in the same segment (after all, the segmentation was unsupervised).

These sort of problems must be addressed and depend on the available data. The related code in the Jupyter Notebook does all that (including the verification that there are no conflicts in the test data).

training_labels = []
training_objects = []
for klass in classes:
    class_train_objects = [v for i, v in enumerate(objects) if objects_ids[i] in segments_per_klass[klass]]
    training_labels += [klass] * len(class_train_objects)
    print("Training samples for class %i: %i" % (klass, len(class_train_objects)))
    training_objects += class_train_objects
Resulting training segments

This figure shows the resulting training segments, obtained from individual sample pixels

Classifier setup

Once we have our training data ready, we can classify all the objects. This step is very simple and similar to what was done in the previous post:

classifier = RandomForestClassifier(n_jobs=-1), training_labels)
predicted = classifier.predict(objects)

Now that we have predicted a class for each segment it’s time to go back to the pixels dimension. To do this we are going to start from segmented image (where each pixel has got a segment identifier) and match it with the predicted data, which assigns a class label to each segment:

classification = np.copy(segments)
for segment_id, klass in zip(objects_ids, predicted):
    classification[classification==segment_id] = klass

And that’s it: we have a pixel-by-pixel classification.

Resulting classification

This figure shows the resulting classification

To assess the results we could do part of what we did in the previous post but there’s a problem that must be dealt with: the verification data is given as pixel samples, just like our training samples. But then the segmentation process converted the training pixels into training objects (segments), thus growing the training regions. Now, using the validation data without proper care we risk validating with data that was actually used for training. That’s a common methodological error.

Closing up

The actual results of the classification are not relevant. They depend on a lot of factors, some of which have been mentioned briefly but none of them have been seriously developed in this post.

What we’ve seen here helps as an overview of a strong and solid machine-learning technique for images processing, that’s complex, highly sophisticated and not yet widely adopted.

But what’s really important to me is to show the power of the Python ecosystem for these kind of tasks.

Please, feel free to comment, contact us or whatever. I’m @py_litox on Twitter.


[1]Blaschke T., 2009, Object based image analysis for remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 65 (2010), pp. 2–16
[2]Peña et al., 2013, Object-Based Image Classification of Summer Crops with Machine Learning Methods. Remote Sens. 2014, 6, 5019-5041.
[3]Blaschke et al.,, 2013, Geographic Object-Based Image Analysis – Towards a new paradigm. ISPRS Journal of Photogrammetry and Remote Sensing, 87 (2014), pp. 180–191

Previous / Next posts