From 511e254ff037b2f9a13a783a884e431135547f12 Mon Sep 17 00:00:00 2001 From: Karen DRUART <karen.druart@pasteur.fr> Date: Thu, 6 Oct 2022 15:53:18 +0200 Subject: [PATCH] Upload New File --- XML_Subsetter/README.md | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 XML_Subsetter/README.md diff --git a/XML_Subsetter/README.md b/XML_Subsetter/README.md new file mode 100644 index 0000000..57ff771 --- /dev/null +++ b/XML_Subsetter/README.md @@ -0,0 +1,32 @@ +# extract_XML_subset + +This script can be used to extract XML sequences from a subset fasta file OR to extract XML sequences and build a fasta file from a list of accessions ids. <br> +It's written in python3 and use basics packages, making it compatible with all operating sytems. + +## General information +This script only read XML files from uniprot.org or formatted following *uniprot.xsd* (https://www.uniprot.org/docs/uniprot.xsd). The XML file have to contains minimal information such as: +- \<entry dataset="Swiss-Prot" \> **OR** \<entry dataset="TrEMBL" \> +- \<accession\> +- \<name\> +- \<fullName\> with the first occurence the recommended name of uniprot +- \<sequence\> + + +## Usage +This script can be used to: +1. extract XML sequences from a subset fasta file: + ```./extract_XML_subset.py -x uniprot.xml -f file.fasta -o output_name ``` +2. extract XML sequences and build a fasta file from a list of accession ids + ```./extract_XML_subset.py -x uniprot.xml -l accessions.txt -o output_name``` <br> +The -o is optionnal, the name of the fasta file (in the first case) or the name of the accession list (in the second case) will be used as output name. +*** + +## Improvements +If you have some suggestions, please contact me by email or use the git interface adding issue. + +## Contact +Karen Druart - karen.druart@pasteur.fr + + + + -- GitLab