Developing an eDNA-based tool for regulatory benthic compliance in Scottish salmon aquaculture (BactMetBar)

Project summary

Academic lead: Scottish Association for Marine Science (SAMS) with RPTU Kaiserslautern-Landau; Institute for Biodiversity and Freshwater Conservation (University of the Highlands and Islands),

Partners: Salmon Scotland, Mowi, Scottish Sea Farms, Scottish Environment Protection Agency (SEPA)

Funders: Sustainable Aquaculture Innovation Centre (SAIC) and Scottish Government Marine Directorate

Project facts

Impact

This tool was the first eDNA-based tool used in fish-farm compliance regulation, marking a significant methodological shift from traditional macrobenthic analysis and the evolution and improvement of this tool, through the BactMetBar project is a further step in the continued innovation in the use eDNA to assess and monitor the environmental effects of fish farms in Scotland.

Total value

Case study

This project is now complete. You can download the full case study by clicking the button below, for extensive information on work done, outcomes and further reading.

Download

FULL CASE STUDY

BACKGROUND

Benthic monitoring plays a central role in understanding and safeguarding marine ecosystem health. By examining the invertebrate communities living within and on seabed sediments, we can generate detailed insight into the ecological condition. Patterns of species diversity, abundance and composition reveal whether an ecosystem is functioning normally, whether it differs from adjacent areas, and whether it has changed over time. Because many benthic organisms are relatively sedentary and respond predictably to disturbance, they provide an integrated signal of environmental quality.

Within Scottish aquaculture, benthic monitoring is a statutory requirement. Marine fish farms must demonstrate that seabed conditions remain within acceptable ecological limits, as defined (to date) through macrobenthic analysis and calculation of the Infaunal Quality Index (IQI). Under updated regulations, operators are now required to collect and analyse four times as many benthic samples as previously mandated. These samples are processed to identify and catalogue infaunal taxa, from which the IQI is calculated and compliance status determined.

Although well-established, traditional macrobenthic analysis is resource-intensive. Sample washing, sorting, taxonomic identification and data processing require specialist expertise and can take between three and six months to complete. This lag constrains adaptive site management and increases operational costs.

Environmental DNA (eDNA) metabarcoding offers a complementary approach. By analysing DNA fragments present in sediments, it is possible to characterise benthic communities through their molecular associations. In an earlier project funded by SAIC, bacterial eDNA profiles were shown to be highly repeatable and, when analysed using random forest machine learning, capable of predicting macrobenthic IQI.

Delivered through a partnership including the Scottish Association for Marine Science (SAMS), , the Technical University of Kaiserslautern, the Institute for Biodiversity and Freshwater Conservation at UHI ,Salmon Scotland, MOWI, Scottish Sea Farms and SEPA, the two-phase project aimed to build on this foundation and develop and validate an operational eDNA-based tool for regulatory compliance assessment in Scottish salmon aquaculture.

AIMS

The overall objective of BactMetBar was to develop and validate an eDNA-based method capable of assessing seabed compliance around fish pens in the Scottish salmon sector.

Phase 1 focused on delivering a standard operating procedure (SOP) for sediment sampling and processing, constructing and validating a random forest model to predict IQI from bacterial assemblages, assessing seasonal effects, and testing interlaboratory consistency. The outcome was the R package eDNA2IQI, that predicts IQI from raw input sequence data a

Phase 2 had three specific aims:

To further validate and update the eDNA2IQI algorithm by incorporating additional data and testing whether the inclusion of site identity improved model performance.
To assess whether eDNA-based models could be retrained to predict pen-edge compliance, specifically the presence and abundance of enrichment polychaete reworkers.
To evaluate model performance at low and high IQI values and improve prediction accuracy through curve correction.

OVERVIEW

Phase 1 delivered the core workflow. Project partners agreed and implemented an SOP for sediment collection and processing. Industry partners supplied samples with associated metadata, academic partners undertook 16S rRNA sequencing, and bacterial reads were annotated to the family level.

Using these data, random forest models were trained to predict macrobenthic IQI from bacterial assemblages. Blind testing was undertaken using six full-transect surveys. A ring test evaluated laboratory effects by processing identical samples in three laboratories. Seasonal sampling at a local fish farm examined temporal variability at reference stations and pen-edge locations.

All sequence processing and modelling code was incorporated into the eDNA2IQI R package, enabling upload of SOP-compliant sequence data, automated quality control, denoising, annotation and IQI prediction using multiple random forest algorithms.

Phase 1 met all objectives on time and within budget. The standard operating procedure was implemented successfully, the random forest model demonstrated strong predictive ability, and blind testing confirmed performance on full-transect surveys. Interlaboratory comparison showed negligible differences in IQI predictions despite differences in other diversity metrics. Seasonal effects at reference stations were not mirrored at pen-edge sites, indicating farm-linked patterns dominated bacterial signals.

The eDNA2IQI R-package was published, alongside the underpinning sequence data and associated IQI scores, and all made freely available.

Phase 2

Incorporation of additional data

The original eDNA2IQI model was based on 745 samples from 75 sites. Phase 2 incorporated additional datasets using the same training and validation methods described in Wyness et al. (2025). New data included 77 samples from 14 sites generated during Phase 1, plus SEPA audit survey data from 2023 (50 samples, 7 sites), 2024 (25 samples, 4 sites), and one site from 2021 (15 samples). While these new samples lacked accompanying macrobenthic raw data, SEPA-provided IQI scores were available for model training.

The model was retrained and performance metrics recalculated.

The original regression relationship was:

Predicted IQI = 0.137 + 0.746 × Actual IQI

R² = 0.801

RMSE = 0.0974

After incorporation of additional samples, the updated model was:

Predicted IQI = 0.142 + 0.744 × Actual IQI

R² = 0.788

RMSE = 0.1000

The relationship between predicted and actual IQI changed only marginally. While there was a slight reduction in R² and an increase in RMSE, this was consistent with trends observed during earlier model expansions. The marginal loss in fit reflects broader applicability. RMSE values were similar to inherent variability in repeated macrobenthic IQI assessments (~0.10), indicating that model error remains within expected biological and analytical variation.

Testing inclusion of site identity

The original model was trained without repeat visits to the same production cycle sites. During the extension, repeat-visit data became available, allowing investigation of whether including site identity as a predictor would improve performance.

The original 745-sample dataset was used. For each site, samples were split into two subsets (A and B) spanning the site’s IQI range. Site names were converted to dummy variables and incorporated into the random forest training procedure using a leave-one-site-out framework. The revised model was implemented within the eDNA2IQI package and applied to Phase 2 test data, both with and without site information included.

The new dataset included 152 samples from 23 sites, 14 of which were repeat visits. Comparison of predictions from models with and without site identity showed near-identical results across the IQI range (0.2-0.7). Inclusion of the site as a predictor did not meaningfully alter IQI predictions. At this stage of development, site information does not enhance model performance.

Pen-edge compliance modelling

At pen-edge locations, regulatory compliance requires the presence of at least two species of enrichment polychaete reworkers, such as Capitella spp. and Malacoceros spp. The macrobenthic IQI is not used for pen-edge compliance because extreme organic enrichment can reduce diversity and limit index reliability.

Macrobenthic compliance data were extracted from partner datasets, focusing on pen-edge samples. Response variables included the number of enrichment polychaete taxa, total abundance per m², and pass/fail compliance status. Failures were defined as abundance <1000 individuals per m² and/or 0-1 enrichment species.

After rarefaction to 5000 reads and removal of incomplete samples, 153 samples from 68 sites were available. Seven random forest models were constructed using different combinations of categorical, regression and ordinal training variables, including transformations of abundance data. Performance was assessed using Out-of-Bag (OOB) validation and Leave-One-Site-Out (LOSO) validation to examine both internal performance and performance beyond the training sites. Model performance metrics included accuracy, fail recall, fail precision and F1 score. Threshold-based consensus approaches and a final ensemble random forest model were also evaluated.

Among 153 pen-edge samples, 16 samples from 9 sites failed at least one compliance criterion; 7 samples from 4 sites failed both abundance and species criteria.

In OOB testing, the strongest individual model was the regression model trained on log-transformed abundance (Model 6). It achieved 92.2% accuracy, 62.5% fail recall, 62.5% fail precision, and an F1 score of 62.5%. Models trained on non-transformed regression variables failed to identify any failures.

Consensus threshold testing showed that requiring failure on one or two models optimised F1 performance. A threshold of one model maximised recall but increased false fails, whereas two-model thresholds increased precision at the expense of recall.

LOSO validation showed lower F1 scores (maximum 53.3%), reflecting reduced transferability under severe class imbalance. The final ensemble random forest model achieved F1 scores of 41.7% (OOB) and 38.1% (LOSO). Model 6 remained the most influential predictor.

Although performance was constrained by limited fail samples and class imbalance, results demonstrate that bacterial eDNA patterns show a detectable signal associated with pen-edge compliance status.

Performance at low and high IQI values

The original eDNA2IQI predictions were constrained between IQI values of approximately 0.20 and 0.75, despite training data spanning nearly the full 0-1 range. To address this compression effect, a linear correction was applied.

Using the extended dataset, leave-one-site-out testing was conducted. For each iteration, random forest models were trained excluding one site, predictions generated, and a line of best fit between predicted and actual IQI derived. This regression equation was then applied to rotate predictions towards a 1:1 relationship.

Performance of adjusted and unadjusted predictions was compared using mean absolute error, RMSE, residual analysis and Water Framework Directive (WFD) classification accuracy.

Adjustment of predictions expanded the IQI prediction range from 0.189-0.767 to 0.165-0.785, closer to the actual range of 0.053-0.922. Mean absolute error decreased marginally (0.071 to 0.070), and RMSE improved slightly in site-removed testing (0.098 to 0.097).

WFD classification accuracy improved from 54.9% to 56.4%. The adjusted model substantially improved the representation of ‘Bad’ and ‘High’ categories. The unadjusted model under-predicted ‘Bad’ by 5% and ‘High’ by 14.1%, whereas the adjusted model reduced discrepancies in ‘Bad’ and ‘Poor’ categories to below 1.5%.

In full-site transects, adjustment slightly improved RMSE at four of six sites, though overall R² across all sites combined decreased marginally due to leverage effects.

An adjustment module has been incorporated into eDNA2IQI v3, adding an adjusted IQI output column.

IMPACT

BactMetBar has delivered an operational, standardised and validated eDNA-based tool for predicting macrobenthic IQI from sediment bacterial assemblages. The eDNA2IQI R package enables processing of sequence data and automated IQI prediction, incorporating updated models and curve-corrected outputs.

The tool demonstrates strong predictive performance comparable to inherent variability in traditional macrobenthic IQI assessments. Incorporation of additional datasets increased model generalisability without substantive loss of performance. Inclusion of site identity was shown to be unnecessary. eDNA-based modelling shows measurable potential for assessing pen-edge compliance, although further data would strengthen predictive power.

The package is under consideration by SEPA to enhance the useability, scope and prediction performance of the statutory eDNA compliance screening tool introduced in 2023. This tool was the first eDNA-based tool used in fish-farm compliance regulation, marking a significant methodological shift from traditional macrobenthic analysis and the evolution and improvement of this tool, through the BactMetBar project is a further step in the continued innovation in the use eDNA to assess and monitor the environmental effects of fish farms in Scotland.