Alignment-Based QC Implementation for a Genomics Client

Sequoia Applied Technologies partnered with a leading genomics organization to build a robust alignment based quality control layer inside their sequencing data pipeline. The objective was to improve visibility into mapping accuracy, coverage consistency, and sample integrity while keeping the design platform neutral and ready for audits.

We delivered a modular QC system that evaluates alignment outputs, flags anomalies, and produces summaries that can be viewed in dashboards or consumed over APIs. This led to faster detection of quality issues and consistent, reproducible outcomes across sequencing runs.

Overview

What the layer covers

It reads BAM or CRAM, summarizes mapping performance, and flags anomalies early. Results are saved as JSON and CSV, with a batch summary for review.

Mapping

  • Mapped percent and properly paired percent
  • Mean MAPQ and secondary or supplementary rates
  • Chimeric and split read signals

Coverage

  • Mean and median depth
  • Uniformity and percent above thresholds
  • Low coverage region count

Library and bias

  • Insert size mean and spread
  • Duplication rate
  • GC bias score

Integrity

  • Contamination estimate
  • Recalibration shift summary
  • Status flags for pass, warn, and fail

All examples use synthetic data. No client identifiers are stored or shown.

Integration

How it fits your workflow

Drop the module after alignment with a standard short or long read aligner. Run standard QC utilities to gather metrics. A small script merges outputs into a single JSON and CSV. The dashboard reads these files and shows a clean view with status flags.

  • Inputs: BAM or CRAM and a reference name
  • Outputs: per sample JSON and CSV, plus a batch summary
  • Reports: download as PDF for audits
Configuration

Typical starting thresholds

  • Mapped percent at or above 95
  • Mean MAPQ at or above 30
  • Uniformity at or above 80
  • Duplicate rate at or below 15
  • Contamination at or below 2

These values are placeholders and can be tuned per project and platform.

Why Sequoia

Outcomes

  • Earlier detection of library or instrument issues
  • More stable variant analysis
  • Consistent quality across sequencing runs
  • Audit ready reports for oversight

Explore related areas: Testing, AI and ML, Digital Transformation.

FAQ

Common questions

Can this run on short and long reads

Yes. The module is platform agnostic and can summarize metrics for short and long read workflows.

How are results shared

JSON and CSV export for pipelines. PDF for audits. Dashboards can read from the same files.

How do you handle privacy

Only synthetic examples are shown in public content. No identifiers are stored in samples or screenshots.