Skip to content
Snippets Groups Projects
Commit 8424f0b7 authored by Blaise Li's avatar Blaise Li
Browse files

Script to extract the most abundant reads.

parent 53909846
No related tags found
No related merge requests found
#!/bin/sh
# Extracts the most abundant sequences from a fastq of fasta file (the file can be gzipped)
# Outputs those reads in fasta format, the most abundant first, with their count as comment
# Usage: fastq2most_abundant.sh <fastq file> <number of top most abundant sequences wanted>
# Extract the sequence
# Sort and count
# Find the most abundant
# Format as fasta
bioawk -c fastx '{print $seq}' ${1} \
| sort | uniq -c \
| sort -nr | head -${2} \
| mawk '{print ">"NR" ("$1")\n"$2}'
exit 0
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment