diff --git a/README.md b/README.md new file mode 100644 index 0000000000000000000000000000000000000000..058f471c7729e2deb1e9254ede7e2172bf6500c3 --- /dev/null +++ b/README.md @@ -0,0 +1,23 @@ +# Convert metaphlan marker_counts output to raw count matrix + +1. Analyse your samples with metaphlan: + +The marker_counts parameter is required to output the count per marker gene: +``` +metaphlan --input_type fastq --bowtie2db metaphlan_db -t marker_counts -o sample_count.tsv metagenome_1.fastq,metagenome_2.fastq +``` + +2. Aggregate your counts at SGB level + +The aggregation at SGB level can be performed with the following command: +``` +python3 aggregate_SBG.py sample_count.tsv mpa_vOct22_CHOCOPhlAnSGB_202212_SGB_len.txt.gz sample_name sample_aggregated.tsv +``` +The counts are normalized according to the length of the marker gene to a default length of 1000. + +3. Build the count matrix and the taxonomy matrix + +For each sample, we can aggregate them with the following command: +``` +python3 build_matrix.py sample1_aggregated.tsv sample1_aggregated.tsv output_counts.tsv output_taxonomy.tsv +```