Classifier Integration Guide
libdart-damage ships a C-compatible API (dart/damage_c_api.h) that lets any
classifier link the library regardless of its own C++ standard. The
implementation compiles at C++17; the header is plain C and can be included
from C, C++14, or any later standard.
CMake setup
# In your top-level or src/ CMakeLists.txt
add_subdirectory(path/to/libdart-damage)
target_link_libraries(your_classifier PRIVATE dart-damage)
dart-damage sets its own CXX_STANDARD 17 — it does not affect your
target's standard.
Two-pass workflow
Pass 1 — estimate sample damage
Feed all reads through the accumulator before classification begins.
#include "dart/damage_c_api.h"
dart_profile_t *profile = dart_profile_create();
if (!profile) { /* OOM */ }
/* For each read in every input file: */
dart_profile_add_read(profile, seq, len);
dart_profile_finalize(profile); /* fit the model — call exactly once */
Inspect the result:
float dmax = dart_profile_dmax(profile); /* 0–1 */
int lib_type = dart_profile_library_type(profile); /* 0=UNKNOWN 1=DS 2=SS */
int validated = dart_profile_damage_validated(profile);
int artifact = dart_profile_damage_artifact(profile);
int reliable = dart_profile_is_reliable(profile);
Only activate correction when the signal is genuine:
Pass 2 — correct or mask each read
Two functions are available. Both use the same per-position damage probabilities
(empirical damage_rate_5prime/3prime arrays for the first 15 positions,
exponential extrapolation d_max × e^{-λ × pos} beyond that).
dart_correct_read — back-convert damaged bases to their inferred originals:
char corrected[MAX_READ_LEN + 1];
size_t n_fixes = dart_correct_read(profile,
seq, len,
corrected,
0.30f); /* confidence threshold */
/* use corrected instead of seq */
Reverts T → C at 5′ positions and A → G at 3′ positions where the damage probability ≥ threshold. Appropriate when you want to recover the likely original sequence before alignment or classification.
dart_mask_read — replace damaged positions with a mask character:
char masked[MAX_READ_LEN + 1];
size_t n_masked = dart_mask_read(profile,
seq, len,
masked,
0.30f, /* confidence threshold */
'N'); /* mask character */
/* use masked instead of seq */
Writes mask_char (typically 'N') at any position where the C→T or G→A
damage probability ≥ threshold, leaving all other bases unchanged. Useful
for k-mer classifiers (e.g. Metabuli) that skip k-mers containing 'N':
damaged positions are excluded from k-mer extraction without assuming the
original base. Prefer masking over correction when the downstream tool
handles ambiguous bases natively.
Cleanup
Threading
dart_profile_add_read is not thread-safe. For multi-threaded Pass 1,
create one profile per thread and merge before finalizing:
#include "dart/frame_selector_decl.hpp"
std::vector<dart_profile_t *> per_thread(n_threads);
for (auto &p : per_thread) p = dart_profile_create();
/* ... fill per_thread[tid] from each thread's reads ... */
/* Merge into thread 0's profile (single-threaded): */
for (int i = 1; i < n_threads; ++i)
dart::FrameSelector::merge_sample_profiles(
per_thread[0]->profile, per_thread[i]->profile);
dart_profile_finalize(per_thread[0]);
/* use per_thread[0] for Pass 2 */
dart_correct_read is thread-safe — it only reads the finalized profile.
API reference
dart_profile_create
Allocate a new profile accumulator. Returns NULL on OOM.
dart_profile_destroy
Free all resources. Safe to call with NULL.
dart_profile_add_read
Accumulate one DNA read into the profile. Silently ignored after
dart_profile_finalize. Not thread-safe (see Threading above).
dart_profile_finalize
Fit the exponential decay model. Must be called exactly once before any
getter or dart_correct_read / dart_mask_read. Subsequent calls are no-ops.
Getters (require finalized profile)
| Function | Returns |
|---|---|
dart_profile_dmax(p) |
Combined D_max (0–1) |
dart_profile_lambda5(p) |
5′ decay constant λ |
dart_profile_lambda3(p) |
3′ decay constant λ |
dart_profile_library_type(p) |
0=UNKNOWN, 1=DS, 2=SS |
dart_profile_damage_validated(p) |
1 if both channels agree |
dart_profile_damage_artifact(p) |
1 if signal looks like adapter bias |
dart_profile_is_reliable(p) |
1 if estimate is statistically reliable |
All return safe defaults (0 / 0.0) before finalization.
dart_correct_read
size_t dart_correct_read(const dart_profile_t *p,
const char *seq, size_t len,
char *out_buf,
float confidence_threshold);
confidence_threshold. out_buf must be at least
len + 1 bytes. Returns the number of bases corrected. Returns 0 if the
profile is not finalized.
dart_mask_read
size_t dart_mask_read(const dart_profile_t *p,
const char *seq, size_t len,
char *out_buf,
float confidence_threshold,
char mask_char);
mask_char (e.g. 'N') where the
position-dependent damage probability ≥ confidence_threshold. out_buf
must be at least len + 1 bytes. Returns the number of bases masked.
Returns 0 if the profile is not finalized.