How to build a benthic coral reefs analyser
On this page

In this post we explore the development of a benthic coral reef analyzer, built in partnership with ReefSupport to improve the tools for monitoring coral reefs and marine environments.

For the full picture, here is the coral analysis pipeline — tap through for the project:

The coral analysis pipeline: capture, segment, classify, measure Capture → segment → classify → measure coral cover over time

Leveraging computer vision for the segmentation of coral reefs in benthic imagery holds the potential to quantify the long-term growth or decline of coral cover within marine protected areas

Project Scope

Our collaboration develops an underwater benthic imagery model that identifies and locates the functional groups in a reef — flexible enough to use across marine regions worldwide.

Benthic Analysis System The benthic imagery analysis system by Reef Support

We start by distinguishing hard from soft coral, then add finer taxonomic detail as the system matures — a broad foundation to build comprehensive reef analysis on.

Provided Datasets

The data provided by ReefSupport is made available on a publicly hosted Google Cloud bucket. It comes in two forms:

Point (sparse) labels

Random points in an image are classified — typically 50–100 per image. Sources: IBF, Reefolution, and Seaview.

Mask (dense) labels

Full segmentation masks for hard and soft corals. Sources: CoralSeg, plus ReefSupport's own carefully annotated subsets covering reefs worldwide.

Each image is associated with a dense stitched mask made of all the individual coral instances.

Stitched and Individual Masks Associated labels from the ReefSupport dataset - hard and soft coral instances masks

Exploratory Data Analysis

Before modelling, we explored the dataset closely — and it surfaced several data-quality issues worth fixing first.

Data quality issues

Empty masks

Some stitched masks were entirely black — 532 in SEAVIEW_PAC_USA and 328 in SEAVIEW_ATL. Removing these empty masks gave a cleaner dataset and improved performance during training and evaluation.

Empty Masks samples Empty Masks samples

Low quality labels

The dense labels in SEAVIEW/PAC_USA covered almost all the coral in an image as one big mask rather than outlining each individual, so we excluded them from the training set.

Low quality Masks samples Low quality Masks samples

Mismatched sparse and dense labels

We compared ReefSupport’s dense (mask) labels against the sparse (point) labels for the same images.

The presented samples illustrate instances where point labels contradict dense labels. Each white cross signifies a label mismatch, with the first sample showing a 17% error mismatch and the last sample demonstrating a complete 100% error mismatch.

Data leakage

In one region, many images overlap with their neighbours within the same quadrats. If overlapping images land in different splits, the model effectively sees test content during training — inflating its scores. Ordering by image ID makes it clear: most images share content with their neighbours.

Sequence 1 A sequence of 4 photos that share overlaps with their neighbours

Sequence 2 Another sequence of 4 photos that share overlaps with their neighbours

Since it is confined to one region, we gauge its impact by evaluating the model region by region.

Class imbalance

The dataset skews heavily towards hard coral — roughly five times more instances than soft coral. Such imbalance biases a model towards the majority class and hurts its performance on the minority one.

Class Imbalance Class imbalance distributions

Data Preparation

YOLOv8 TXT format

To use the YOLOv8 ecosystem, we first convert the raw datasets into its expected format.

YOLOv8 TXT Format from Individual Masks YOLOv8 TXT format conversion

Each line represents an instance of a class with a defined contour. It has the following format:

class_number x1 y1 x2 y2 x3 y3 ... xk yk
class_number x1 y1 x2 y2 x3 y3 ... xj yj

Where the coordinates x and y are normalized to the image width and height accordingly. Therefore, they always lie in the range [0,1].

Example:

1 0.617 0.359 0.114 0.173 0.322 0.654
0 0.094 0.386 0.156 0.236 0.875 0.134

Therefore, each line corresponds to an individual mask instance.

The OpenCV library is employed to convert the dense individual masks into contour coordinates.

Data Modeling

Data Split

In this section, we elucidate the methodology employed for the train/val/test splits across different datasets.

For each region, a dedicated dataset is created with an 80/10/10 split ratio for train/val/test. Simultaneously, a comprehensive global dataset is established using the same split ratios. Importantly, any image allocated to the test set for a region-specific dataset is also included in the test set for the global dataset (similarly for train and val splits). This design facilitates the evaluation of models trained on region-specific datasets against the global dataset.

Dataset Region splits ratio train val test total
ALL ALL 80/10/10 1392 173 177 1742
SEAFLOWER BOLIVAR 80/10/10 196 24 25 245
SEAFLOWER COURTOWN 80/10/10 192 24 25 241
SEAVIEW ATL 80/10/10 264 33 33 330
SEAVIEW IDN_PHL 80/10/10 189 24 24 237
SEAVIEW PAC_AUS 80/10/10 467 58 59 584
TETES PROVIDENCIA 80/10/10 84 10 11 105

Instance Segmentation vs Semantic Segmentation

Semantic segmentation assigns a class label to each pixel in an image, such as ‘person,’ ‘dog,’ or ‘flower,’ grouping together pixels of the same class. Conversely, instance segmentation distinguishes between individual instances of objects within the same class, treating each one as a separate entity.

Semantic Segmentation vs Instance Segmentation Semantic segmentation vs Instance segmentation

For analyzing benthic coral reefs, an instance segmentation approach proves superior as it enables precise localization and counting of reef organisms.

Evaluation Metrics

We evaluate segmentation with mean IoU (mIoU) and the Dice coefficient, avoiding mean pixel accuracy since it’s misleading on skewed datasets.

mIoU (Jaccard index) measures the overlap between the predicted and ground-truth masks — higher is better:

$$\mathit{IoU} = \dfrac{A \cap B}{A \cup B}$$

Dice coefficient (F1) also rewards overlap but weights true positives more heavily, which makes it well-suited to imbalanced data:

$$\mathit{DiceCoefficient} = \dfrac{2 \times TP}{2 \times TP + FP + FN}$$

YOLOv8

Overview

We took a pretrained YOLOv8 model and fine-tuned it for our instance segmentation task. YOLOv8 is fast, accurate, and easy to work with, and it handles a range of tasks — object detection, tracking, instance segmentation, image classification, and pose estimation.

A benthic image goes into the segmentation model and comes out with each coral colony mapped Segmentation on a benthic image: a photo in, each colony mapped out

Training

Baseline

We first established a baseline to gauge the approach: a medium-size pretrained model fine-tuned for 5 epochs on the train split.

mIoU IoU_hard IoU_soft IoU_other mDice Dice_hard Dice_soft Dice_other
0.70 0.64 0.58 0.89 0.82 0.78 0.73 0.94

Quantitative Baseline Results Results / Quantitative - Training metrics (left) and pixel level confusion matrix (right)

Qualitative Baseline Results Results / Qualitative

The initial results are highly promising, prompting us to further optimize the performance of the modeling approach through meticulous selection of hyperparameters.

Best Model

After hundreds of GPU-hours of hyperparameter search, we arrived at the best-performing models.

Given the uncertainty about ReefSupport’s hardware configurations and the intended use of the models (including the possibility of running on live video streams from underwater cameras), we aimed to offer a diverse range of models. These span from models suitable for embedding on edge devices, enabling real-time video stream segmentation, to high-end GPUs delivering peak performance. This approach ensures flexibility to accommodate various deployment scenarios.

The pre-trained YOLOv8 models undergo fine-tuning for 140 epochs with images resized to 1024x1024 pixels. Additionally, random flipping and rotation of images up to 45 degrees are applied during training.

Data Augmentation Data Augmentation / Batch Samples

mIoU IoU_hard IoU_soft IoU_other mDice Dice_hard Dice_soft Dice_other
0.85 0.80 0.81 0.94 0.92 0.89 0.90 0.97

Quantitative Best Results Results / Quantitative - Training metrics (left) and pixel level confusion matrix (right)

Qualitative Best Results Results / Qualitative

Evaluation

The subsequent table provides a summary of the performance of the best model on the test sets for each region:

data mIoU IoU_hard IoU_soft IoU_other mDice Dice_hard Dice_soft Dice_other
all 0.85 0.80 0.81 0.94 0.92 0.89 0.90 0.97
sf_bol 0.80 0.85 0.63 0.93 0.89 0.92 0.77 0.97
sf_crt 0.72 0.70 0.54 0.94 0.83 0.82 0.70 0.97
sv_atl 0.78 0.63 0.78 0.92 0.87 0.78 0.87 0.96
sv_phl 0.62 0.75 0.21 0.91 0.72 0.86 0.34 0.95
sv_aus 0.69 0.76 0.38 0.92 0.79 0.86 0.55 0.96
tt_pro 0.87 0.77 0.88 0.96 0.93 0.87 0.94 0.98

As the various evaluation metrics are weighted in proportion to the number of pixels per region, we provide a summary below, illustrating the different weights assigned to regions based on their respective pixel counts:

data # images (test) # pixels weight (%) mIoU IoU_hard IoU_soft IoU_other
sf_bol 25 7056000000 39.2 0.80 0.85 0.63 0.93
sf_crt 25 1912699566 10.6 0.72 0.70 0.54 0.94
sv_atl 33 1136559093 6.3 0.78 0.63 0.78 0.92
sv_phl 24 866520651 4.8 0.62 0.75 0.21 0.91
sv_aus 59 1944497328 10.8 0.69 0.76 0.38 0.92
tt_pro 11 5079158784 28.2 0.87 0.77 0.88 0.96

Model size vs Model accuracy

The table below summarizes the performance of the different YOLOv8 models that are trained on the same training set, using the same test set for evaluation.

model size mIoU IoU_hard IoU_soft IoU_other mDice Dice_hard Dice_soft Dice_other
x 0.85 0.79 0.81 0.94 0.92 0.88 0.90 0.97
l 0.85 0.80 0.81 0.94 0.92 0.89 0.90 0.97
m 0.85 0.80 0.80 0.94 0.92 0.89 0.89 0.97
s 0.84 0.78 0.80 0.93 0.91 0.88 0.89 0.98
n 0.83 0.77 0.80 0.93 0.91 0.87 0.89 0.97

The top-performing model is the l size model, as indicated in the table above. As the model size decreases, there is a slight degradation in performance—from a mIoU of 0.85 to 0.83. However, the advantage of smaller models lies in their faster execution and compatibility with smaller hardware devices.

Conclusion

YOLOv8 proved a strong fit for this instance-segmentation task — accurate even on modest hardware, and fast enough to run on live underwater video streams, which makes it practical for real deployments.

Hard Coral Viz Benthic Segmentation / Hard Coral

In conclusion, while YOLOv8 presents a robust solution for the instance segmentation task, it is crucial to carefully address issues related to regional model performance, data leakage, and dataset quality. The insights gained from our findings are invaluable for refining and optimizing computer vision applications in marine biology and underwater image segmentation.

You can try the segmenter yourself on real benthic imagery — the interactive demo runs right in your browser.

Try the interactive demo

See the model in action right in your browser — try it on the built-in examples or your own data. No install, no setup.

 Open the demo