Upload New File

511e254f · Karen DRUART · efd91090 · 511e254f
Commit 511e254f authored 2 years ago by Karen DRUART
--- a/XML_Subsetter/README.md
+++ b/XML_Subsetter/README.md
+# extract_XML_subset
+
+This script can be used to extract XML sequences from a subset fasta file OR to extract XML sequences and build a fasta file from a list of accessions ids. <br>
+It's written in python3 and use basics packages, making it compatible with all operating sytems. 
+
+## General information
+This script only read XML files from uniprot.org or formatted following *uniprot.xsd* (https://www.uniprot.org/docs/uniprot.xsd). The XML file have to contains minimal information such as: 
+- \<entry  dataset="Swiss-Prot" \> **OR**  \<entry  dataset="TrEMBL" \>
+- \<accession\>
+- \<name\>
+- \<fullName\> with the first occurence the recommended name of uniprot
+- \<sequence\>
+
+
+## Usage
+This script can be used to: 
+1. extract XML sequences from a subset fasta file:
+    ```./extract_XML_subset.py -x uniprot.xml -f file.fasta -o output_name ```
+2. extract XML sequences and build a fasta file from a list of accession ids
+    ```./extract_XML_subset.py -x uniprot.xml -l accessions.txt -o output_name``` <br>
+The -o is optionnal, the name of the fasta file (in the first case) or the name of the accession list (in the second case) will be used as output name.     
+***
+
+## Improvements
+If you have some suggestions, please contact me by email or use the git interface adding issue. 
+
+## Contact
+Karen Druart - karen.druart@pasteur.fr
+
+
+
+