libdart-damage
Reference-free ancient DNA damage estimation and library-type classification from raw FASTQ reads.
What it does
libdart-damage estimates ancient-DNA terminal damage directly from raw reads, without a reference genome or read alignment. It measures six damage and fragmentation channels, fits exponential terminal-decay models, cross-validates apparent C→T damage against composition-robust stop-codon evidence, and reports an asymmetry-aware d_max_combined.
It also classifies each library as double-stranded (DS), single-stranded (SS), or UNKNOWN using a four-channel BIC classifier. UNKNOWN is the expected result when no damage model clearly beats the null.
Quick start
#include <dart/frame_selector_decl.hpp>
// One-shot: compute profile from a vector of sequences
std::vector<std::string> reads = { "ACGTCTAGCT...", ... };
dart::SampleDamageProfile profile =
dart::FrameSelector::compute_sample_profile(reads);
// Streaming: accumulate reads incrementally
dart::SampleDamageProfile profile{};
for (const auto& seq : reads)
dart::FrameSelector::update_sample_profile(profile, seq);
dart::FrameSelector::finalize_sample_profile(profile);
// Results
std::cout << "D_max (5'): " << profile.d_max_5prime << "\n";
std::cout << "D_max (3'): " << profile.d_max_3prime << "\n";
std::cout << "Library: " << profile.library_type_str() << "\n";
// "double-stranded" | "single-stranded" | "unknown"
Key features
| Feature | Description |
|---|---|
| Reference-free | Works directly on FASTQ, no BAM, no alignment |
| D_max estimation | Calibrated, metaDMG-comparable damage rates |
| Library-type detection | BIC classifier: DS / SS / UNKNOWN |
| Multi-channel validation | 6 damage channels (A, B, B₃′, C, D, E) cross-validate signal |
| GC-stratified estimation | Separates ancient from modern DNA in mixed samples |
| Streaming API | Incremental updates for memory-efficient processing |
Validated performance
Tested on 315 Mediterranean sediment aDNA libraries (two independent datasets):
| Dataset | Correct | UNKNOWN | Wrong | Accuracy (determined) |
|---|---|---|---|---|
| Dataset 1 (91 samples) | 88 | 3 | 0 | 100% |
| Dataset 2 (224 samples) | 193 | 28 | 3 | 98.5% |
UNKNOWN: no detectable signal above the null model (zero-damage libraries where no library type can be inferred from sequence alone).
Build
# CMakeLists.txt
find_package(dart-damage REQUIRED)
target_link_libraries(your_target PRIVATE dart-damage)
Or as a CMake FetchContent dependency: