API Reference
All public symbols live in the dart namespace. Include <dart/frame_selector_decl.hpp>.
FrameSelector
The main entry point. All methods are static.
compute_sample_profile
One-shot profile computation from a vector of sequences. Internally calls reset_sample_profile, iterates update_sample_profile, then finalize_sample_profile.
Parameters
| Name | Type | Description |
|---|---|---|
sequences |
const std::vector<std::string>& |
DNA sequences (ACGT). Reliable estimation requires enough read length and coverage to define both terminal (positions 0–14) and interior (positions 30 to L-30) regions. |
Returns A fully populated SampleDamageProfile. Call profile.is_valid() to check whether enough reads were processed (≥ 1000).
update_sample_profile
Accumulate one read into profile.
Safe parallel pattern: update separate SampleDamageProfile objects in parallel threads, then merge them on the main thread with merge_sample_profiles. Do not call this on the same profile object from multiple threads without external synchronization.
update_sample_profile_weighted
static void update_sample_profile_weighted(
SampleDamageProfile& profile,
const std::string& seq,
float weight);
Weighted accumulation for alignability-weighted damage estimation.
Note: Only accumulates Channel A (T/(T+C), A/(A+G)), interior baseline, codon-position, and CpG counts. Channel B/C/D/E codon counts, hexamers, and GC-bin stratified data are not accumulated. Profiles built with this method will have
channel_b_valid = falseand GC-stratified fields unpopulated after finalization.
finalize_sample_profile
Compute all derived statistics from accumulated counts: fits the exponential decay, estimates D_max, runs the BIC library-type classifier. Must be called once after all update_* calls.
merge_sample_profiles
Merge raw counts from src into dst. Useful for parallel accumulation. Call finalize_sample_profile(dst) after merging.
reset_sample_profile
Zero the main accumulators used by standard damage estimation and library classification.
Important:
reset_sample_profiledoes not currently clear every auxiliary diagnostic accumulator — including Channel B₃′/C/D/E, GC-stratified, and alignability-weighted fields. For guaranteed clean reuse, prefer constructing a freshSampleDamageProfile{}(zero-initialized) rather than callingreset_sample_profileon a previously populated object.
SampleDamageProfile
All fields are public. Key results after finalize_sample_profile:
Primary damage estimates
| Field | Type | Description |
|---|---|---|
d_max_5prime |
float |
Calibrated D_max at 5' end: \(A/(1-b)\) |
d_max_3prime |
float |
Calibrated D_max at 3' end |
d_max_combined |
float |
Final asymmetry-aware D_max estimate |
lambda_5prime |
float |
Fitted decay constant \(\lambda\) at 5' |
lambda_3prime |
float |
Fitted decay constant \(\lambda\) at 3' |
asymmetry |
float |
\|d5 - d3\| / mean(d5, d3), >0.5 flagged as suspicious |
d_max_source_str() |
const char* |
Source used for d_max_combined: "average", "5prime_only", "3prime_only", "channel_b_structural", "channel_b3_structural", "min_asymmetry", "max_ss_asymmetry", "none" |
Library-type classification
| Field | Type | Description |
|---|---|---|
library_type |
LibraryType |
DOUBLE_STRANDED, SINGLE_STRANDED, or UNKNOWN |
library_type_auto_detected |
bool |
true if set by classifier, false if user-forced |
library_type_str() |
const char* |
"double-stranded", "single-stranded", "unknown" |
LibraryType::UNKNOWN means no model beat the null (M_bias): insufficient damage signal for a confident call. Treat as DS for deduplication unless metadata is available.
BIC model scores (library-type classifier)
| Field | Type | Description |
|---|---|---|
library_bic_bias |
double |
BIC of M_bias (null model) |
library_bic_ds |
double |
Best DS model BIC |
library_bic_ss |
double |
Best SS model BIC |
library_bic_mix |
double |
BIC of M_SS_full (4-channel unconstrained) |
library_bic_ds - library_bic_ss > 0 means SS is favoured. Values reach ~10⁹ at high coverage; stored as double to preserve precision.
Per-channel classifier amplitudes and ΔBIC
| Field | Description |
|---|---|
libtype_amp_ct5 / libtype_dbic_ct5 |
5' C→T fitted amplitude and ΔBIC |
libtype_amp_ga3 / libtype_dbic_ga3 |
3' G→A smooth decay (pos 1-10) |
libtype_amp_ga0 / libtype_dbic_ga0 |
3' G→A pos-0 spike |
libtype_amp_ct3 / libtype_dbic_ct3 |
3' C→T (SS original-orientation signal) |
ΔBIC > 0 means the alt (decay) model is preferred over the null for that channel.
Per-position arrays
All arrays are 15 elements, indexed 0–14 from the read terminus.
| Field | Description |
|---|---|
damage_rate_5prime[p] |
C→T excess rate at 5' position p |
damage_rate_3prime[p] |
G→A excess rate at 3' position p |
t_freq_5prime[p] |
Before finalize_sample_profile: raw T count. After: T/(T+C) ratio (normalized in-place). |
tc_total_5prime[p] |
T+C coverage at 5' position p (raw count; not normalized by finalization) |
a_freq_3prime[p] |
Before finalize_sample_profile: raw A count. After: A/(A+G) ratio (normalized in-place). |
ag_total_3prime[p] |
A+G coverage at 3' position p (raw count; not normalized by finalization) |
GC-stratified mixture model
Available after finalize_sample_profile. Requires at least one GC bin with sufficient reads.
| Field | Type | Description |
|---|---|---|
mixture_pi_ancient |
float |
Fraction of C-sites in high-damage components |
mixture_d_ancient |
float |
Expected damage rate among ancient reads (δ > 5%) |
mixture_d_population |
float |
Population-average damage rate across all C-sites |
mixture_d_reference |
float |
Damage rate in GC > 50% bins (metaDMG proxy) |
mixture_K |
int |
Number of mixture components selected by BIC |
mixture_converged |
bool |
Whether the EM algorithm converged |
gc_stratified_valid |
bool |
At least one GC bin has a valid estimate |
Validation and reliability flags
| Field | Type | Description |
|---|---|---|
damage_validated |
bool |
Joint model evidence supports genuine terminal deamination |
damage_artifact |
bool |
Channel A-like enrichment is present, but joint evidence indicates composition/artifact rather than real damage |
terminal_inversion |
bool |
Summary flag: terminal damage rate < interior at either end |
inverted_pattern_5prime / inverted_pattern_3prime |
bool |
End-specific terminal depletion flags |
position_0_artifact_5prime / position_0_artifact_3prime |
bool |
Position-0 depletion with downstream enrichment; likely adapter/ligation artifact |
composition_bias_5prime / composition_bias_3prime |
bool |
Control channel rises with the damage channel, suggesting compositional rather than damage-driven enrichment |
is_valid() |
bool |
n_reads >= 1000 |
is_detection_unreliable() |
bool |
true when inversion or composition-bias flags indicate unreliable reference-free detection |
Utility functions
// Validation state: VALIDATED, CONTRADICTED, or UNVALIDATED
dart::DamageValidationState state = dart::get_damage_validation_state(profile);
// Suppression factor for downstream damage-aware deduplication
float factor = dart::get_damage_suppression_factor(profile);
// 1.0 = validated, 0.5 = unvalidated, 0.0 = artifact
Typical usage patterns
Streaming with thread-parallel accumulation
#include <dart/frame_selector_decl.hpp>
#include <thread>
// Per-thread profiles
std::vector<dart::SampleDamageProfile> partial(n_threads);
for (auto& p : partial) dart::FrameSelector::reset_sample_profile(p);
// Fill partial[i] in parallel...
// Merge on main thread
dart::SampleDamageProfile final_profile;
dart::FrameSelector::reset_sample_profile(final_profile);
for (const auto& p : partial)
dart::FrameSelector::merge_sample_profiles(final_profile, p);
dart::FrameSelector::finalize_sample_profile(final_profile);
Forcing library type
dart::SampleDamageProfile profile = /* ... */;
profile.forced_library_type = dart::SampleDamageProfile::LibraryType::DOUBLE_STRANDED;
When forced_library_type != UNKNOWN, downstream code should use the forced type; library_type_auto_detected will be false.