From c0a5aaf5ae2f4aa8079c57ede244021dd1a42fdb Mon Sep 17 00:00:00 2001
From: Amine  GHOZLANE <amine.ghozlane@pasteur.fr>
Date: Tue, 27 Feb 2024 16:02:52 +0100
Subject: [PATCH] Add README.md

---
 README.md | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
 create mode 100644 README.md

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..058f471
--- /dev/null
+++ b/README.md
@@ -0,0 +1,23 @@
+# Convert metaphlan marker_counts output to raw count matrix
+
+1. Analyse your samples with metaphlan:
+
+The marker_counts parameter is required to output the count per marker gene:
+```
+metaphlan --input_type fastq --bowtie2db metaphlan_db -t marker_counts -o sample_count.tsv metagenome_1.fastq,metagenome_2.fastq
+```
+
+2. Aggregate your counts at SGB level
+
+The aggregation at SGB level can be performed with the following command:
+```
+python3 aggregate_SBG.py sample_count.tsv mpa_vOct22_CHOCOPhlAnSGB_202212_SGB_len.txt.gz sample_name sample_aggregated.tsv
+```
+The counts are normalized according to the length of the marker gene to a default length of 1000.
+
+3. Build the count matrix and the taxonomy matrix
+
+For each sample, we can aggregate them with the following command:
+```
+python3 build_matrix.py sample1_aggregated.tsv sample1_aggregated.tsv output_counts.tsv output_taxonomy.tsv
+```
-- 
GitLab