Skip to content
Snippets Groups Projects
Commit 511e254f authored by Karen  DRUART's avatar Karen DRUART
Browse files

Upload New File

parent efd91090
No related branches found
No related tags found
No related merge requests found
# extract_XML_subset
This script can be used to extract XML sequences from a subset fasta file OR to extract XML sequences and build a fasta file from a list of accessions ids. <br>
It's written in python3 and use basics packages, making it compatible with all operating sytems.
## General information
This script only read XML files from uniprot.org or formatted following *uniprot.xsd* (https://www.uniprot.org/docs/uniprot.xsd). The XML file have to contains minimal information such as:
- \<entry dataset="Swiss-Prot" \> **OR** \<entry dataset="TrEMBL" \>
- \<accession\>
- \<name\>
- \<fullName\> with the first occurence the recommended name of uniprot
- \<sequence\>
## Usage
This script can be used to:
1. extract XML sequences from a subset fasta file:
```./extract_XML_subset.py -x uniprot.xml -f file.fasta -o output_name ```
2. extract XML sequences and build a fasta file from a list of accession ids
```./extract_XML_subset.py -x uniprot.xml -l accessions.txt -o output_name``` <br>
The -o is optionnal, the name of the fasta file (in the first case) or the name of the accession list (in the second case) will be used as output name.
***
## Improvements
If you have some suggestions, please contact me by email or use the git interface adding issue.
## Contact
Karen Druart - karen.druart@pasteur.fr
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment