From 511e254ff037b2f9a13a783a884e431135547f12 Mon Sep 17 00:00:00 2001
From: Karen  DRUART <karen.druart@pasteur.fr>
Date: Thu, 6 Oct 2022 15:53:18 +0200
Subject: [PATCH] Upload New File

---
 XML_Subsetter/README.md | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100644 XML_Subsetter/README.md

diff --git a/XML_Subsetter/README.md b/XML_Subsetter/README.md
new file mode 100644
index 0000000..57ff771
--- /dev/null
+++ b/XML_Subsetter/README.md
@@ -0,0 +1,32 @@
+# extract_XML_subset
+
+This script can be used to extract XML sequences from a subset fasta file OR to extract XML sequences and build a fasta file from a list of accessions ids. <br>
+It's written in python3 and use basics packages, making it compatible with all operating sytems. 
+
+## General information
+This script only read XML files from uniprot.org or formatted following *uniprot.xsd* (https://www.uniprot.org/docs/uniprot.xsd). The XML file have to contains minimal information such as: 
+- \<entry  dataset="Swiss-Prot" \> **OR**  \<entry  dataset="TrEMBL" \>
+- \<accession\>
+- \<name\>
+- \<fullName\> with the first occurence the recommended name of uniprot
+- \<sequence\>
+
+
+## Usage
+This script can be used to: 
+1. extract XML sequences from a subset fasta file:
+    ```./extract_XML_subset.py -x uniprot.xml -f file.fasta -o output_name ```
+2. extract XML sequences and build a fasta file from a list of accession ids
+    ```./extract_XML_subset.py -x uniprot.xml -l accessions.txt -o output_name``` <br>
+The -o is optionnal, the name of the fasta file (in the first case) or the name of the accession list (in the second case) will be used as output name.     
+***
+
+## Improvements
+If you have some suggestions, please contact me by email or use the git interface adding issue. 
+
+## Contact
+Karen Druart - karen.druart@pasteur.fr
+
+
+
+
-- 
GitLab