Script to extract the most abundant reads.

8424f0b7 · Blaise Li · 53909846 · 8424f0b7
Commit 8424f0b7 authored 7 years ago by Blaise Li
--- a/fastx_most_abundant.sh
+++ b/fastx_most_abundant.sh
+#!/bin/sh
+# Extracts the most abundant sequences from a fastq of fasta file (the file can be gzipped)
+# Outputs those reads in fasta format, the most abundant first, with their count as comment
+# Usage: fastq2most_abundant.sh <fastq file> <number of top most abundant sequences wanted>
+# Extract the sequence
+# Sort and count
+# Find the most abundant
+# Format as fasta
+bioawk -c fastx '{print $seq}' ${1} \
+    | sort | uniq -c \
+    | sort -nr | head -${2} \
+    | mawk '{print ">"NR" ("$1")\n"$2}'
+exit 0