Commit c201c0a3 authored by Nicolas  MAILLET's avatar Nicolas MAILLET
Browse files

Huge change in userguide

parent 484cc9bf
......@@ -2,4 +2,338 @@
User Guide
==========
Todo really soon
\ No newline at end of file
Overview
========
You can use **Rapide Peptide Generator** using the standalone called::
rpg
You can obtain help by using::
rpg --help
Installation
============
From pip
--------
The suggested way of installing the latest **RPG** version is through **pip**::
pip install rpg
Then you can use::
rpg --help
From source code
----------------
**RPG** is coded in Python. To manually install it from sources, get the source and install **RPG** using:::
git clone https://gitlab.pasteur.fr/nmaillet/rpg/
cd rpg
python setup.py install
Using without installing
------------------------
You can download source code in Pasteur's **Gitlab**: https://gitlab.pasteur.fr/nmaillet/rpg/.
In order to directly run **RPG** from sources, you need to uncomment line 53 of ``rpg/RapidePeptideGenerator.py``. Modify::
#from context import rpg
To::
from context import rpg
Then, from main **RPG** directory, use::
python3 rpg/RapidPeptidesGenerator.py --help
.. warning:: It not recommended, as you need all requirements of `requirements.txt` installed locally and you may encounter issues with Sphinx autodoc or other unwanted behaviors.
Classical use
=============
Here are classical ways to use **RPG**.
Getting help
------------
To access build-in help, use::
rpg --help
Listing enzymes
---------------
To list all available enzymes, use::
rpg -l
Performing digestion
--------------------
There is two digestion modes in **RPG**. In sequential mode, each protein will be digested by each enzyme, one by one. In concurrent mode, all enzymes are present at the same time during digestion. See :ref:`digestion` for more informations.
.. _oneseq:
Sequential digestion of one sequence
""""""""""""""""""""""""""""""""""""
To perform sequential digestion of the sequence "QWSDORESDF" with enzymes 2 and 5 and store results in `output_file.fasta`, use::
rpg -i QWSDORESDF -o output_file.fasta -e 2 -e 5
Sequential digestion of a (multi)fasta file
"""""""""""""""""""""""""""""""""""""""""""
To perform sequential digestion of `input_file.fasta` with enzymes 2 and 5 and store results in `output_file.fasta`, use::
rpg -i input_file.fasta -o output_file.fasta -e 2 -e 5
Concurrent digestion of a (multi)fasta file
"""""""""""""""""""""""""""""""""""""""""""
To perform concurrent digestion of `input_file.fasta` with enzymes 2 and 5 and store results in `output_file.fasta`, use::
rpg -i input_file.fasta -o output_file.fasta -e 2 -e 5 -d c
Adding a new enzyme
-------------------
To add a new enzyme, use::
rpg -a
Options
=======
Here are all available options in **RPG**:
**-h, --help**: show this help message and exit.
**-a, --addenzyme**: Add a new user-defined enzyme. See :ref:`addenzyme` for more informations.
**-d, --digest**: Digestion mode. Either 's', 'sequential', 'c' or 'concurrent' (default: s). See :ref:`digestion` for more informations.
**-e, --enzymes**: Id of enzyme(s) to use (i.e. -e 0 -e 5 -e 10 to use enzymes 0, 5 and 10). Use -l first to get enzyme ids. See :ref:`enzymes` for more informations.
**-f, --fmt**: Output file format. Either 'fasta', 'csv', or 'tsv' (default: fasta). See :ref:`formats` for more informations.
**-i, --inputdata**: Input file, in (multi)fasta / fastq format or a single protein sequence without commentary. See :ref:`oneseq` for example.
**-l, --list**: Display the list of available enzymes.
**-m, --misscleavage**: Percentage of miscleavage, between 0 and 100, by enzyme(s). It should be in the same order than -enzymes options (i.e. -m 15 -m 5 -m 10). Only for sequential digestion (default: 0). See :ref:`misscleavage` for more informations.
**-n, --noninteractive**: Non-interactive mode. No standard output, only error(s) (--quiet enable, overwrite -v). If output filename already exists, output file will be overwritten. See :ref:`nointer` for more informations.
**-o, --outputfile**: Result file to output result peptides (default './peptides.xxx' depending of --fmt).
**-r, --randomname**: Random (not used) output name file. See :ref:`random` for more informations.
**-q, --quiet**: No standard output, only error(s).
**-v, --verbose**: Increase output verbosity. -vvv will increased more than -vv or -v. See :ref:`verbose` for more informations.
**--version**: show program's version number and exit.
.. _digestion:
Digestion modes
===============
There is two digestion modes in **RPG**. In 'sequential' mode, each protein will be digested by each enzyme, one by one. Launching 3 times **RPG** on the same protein with 3 different enzymes or launching one time **RPG** on the protein with the 3 enzymes in 'sequential' mode leads to exactly the same result.
In concurrent mode, all enzymes are present at the same time during digestion and exposure time is supposed to be infinite, i.e. all possible cleavages **will** occur (there is no miscleavage). In this mode, the cleavage of a first enzyme can make available the cleavage site of another enzyme.
Let's define two enzymes. The first is called 'afterP' (id 28) and cleaves after P. The second is called 'afterK' (id 29) and cleaves after K if there is no P just before. Digesting 'PKPKPKPK' using those two enzymes in sequential mode gives the following result::
$ rpg -i PKPKPKPK -e 28 -e 29
>Input_0_afterP_1_1_115.13198_5.54
P
>Input_1_afterP_3_2_243.30608_9.4
KP
>Input_2_afterP_5_2_243.30608_9.4
KP
>Input_3_afterP_7_2_243.30608_9.4
KP
>Input_4_afterP_8_1_146.18938_9.4
K
>Input_0_afterK_0_8_919.17848_11.27
PKPKPKPK
'afterP' cleaves like expected and 'afterK' is not able to cleave anything.
Digesting 'PKPKPKPK' using those two enzymes in concurrent mode gives the following result::
$ rpg -i PKPKPKPK -e 28 -e 29 -d c
>Input_0_afterP-afterK_1_1_115.13198_5.54
P
>Input_1_afterP-afterK_2_1_146.18938_9.4
K
>Input_2_afterP-afterK_3_1_115.13198_5.54
P
>Input_3_afterP-afterK_4_1_146.18938_9.4
K
>Input_4_afterP-afterK_5_1_115.13198_5.54
P
>Input_5_afterP-afterK_6_1_146.18938_9.4
K
>Input_6_afterP-afterK_7_1_115.13198_5.54
P
>Input_7_afterP-afterK_8_1_146.18938_9.4
K
Here, we have to understand that 'afterP' cleaves at the same positions as in sequential mode and the products (mostly 'KP') are then cleaved by 'afterK'. Indeed, there is no more P before K, making 'afterK' able to cleave.
Default mode is 'sequential' and you can input miscleavage values only on this mode.
.. _misscleavage:
Miscleavage
=============
Sometimes, an enzyme does not cleave at a given position even if requirements are fulfilled. This event is called miscleavage and can have biological, chemical or physical origins. To take into account this behavior in **RPG**, one can assign a miscleavage value to an enzyme, expressed as a **percentage**.
For example, using::
rpg -i QWSDORESDF -e 1 -e 2 -e 3 -m 1.4 -m 2.6
will assign a miscleavage probability of `1.4%` to enzyme `1`, a miscleavage probability of `2.6%` to enzyme `2` and a miscleavage probability of `0%` to enzyme `3` (default behavior). For enzyme `1`, each cleavage will then have a probability of 0.014 to **not** occur.
.. _nointer:
Non-interactive mode
====================
Option **-n, --noninteractive** force **RPG** to not print any standard output, only error(s) are displayed in the shell. It enable '--quiet' option and overwrites --verbose option. If output filename already exists, output file will be systematically overwritten. This option is mostly used in cluster or pipeline when user does not want **RPG** to wait for input or display anything but errors.
.. _formats:
Output
======
Output of **RPG** contains several informations for each generated peptide, in this order:
- Original header of original sequence
- Number of this peptide for the original sequence
- Enzyme used to obtain this peptide
- Cleavage position on the original sequence (0 if no cleavage occurs)
- Peptide size
- Peptide molecular weight estimation
- Peptide isoelectric point estimation (pI)
- Peptide sequence
Peptide molecular weight approximation is computed as the addition of average isotopic masses of each amino acid present in the peptide. Then the average isotopic mass of one water molecule is added to it. Molecular weight values are given in Dalton (Da). It does not take into consideration any kind of modification and for the first and last peptide, the computation is not perfect as it should not be added 1 water to them, but around 17 Da to the N terminal and 1 Da to the C terminal.
Isoelectric point is computed by solving Henderson–Hasselbalch equation using binary search. It is based on Lukasz P. Kozlowski works (http://isoelectric.org/index.html).
The default output is in multi fasta format. The header then summarize all those informations. For example, on the following fasta result::
>Input_0_Asp-N_3_3_419.43738_5.54
QWS
>Input_1_Asp-N_8_5_742.78688_4.16
...
we can see that a sequence was directly inputed in **RPG** (`Input`), the first peptide (`0`) was obtain with `Asp-N` and this enzyme cleaved after the `3`rd amino acid in the original sequence. The peptide has a size of `3` amino acid, a molecular weight estimated at `419.43738` Da and a theoretical isoelectric point of `5.54`. The full sequence is then written (`QWS`), and after, the header of the second peptide is outputted, etc.
More informations can be outputted using :ref:`verbose` option.
.. _random:
Random names
============
Option **-r, --randomname** force **RPG** to use a random name for output file. When using it, **RPG** will not ask user output file name **nor location**. Output file will be create in the folder from which the software is used. This option is generally used for testing or automatic tasks.
.. _verbose:
Verbosity
=========
Verbosity can be increased or decreased in the shell. Output file is not affected by **-v** or **-q** options.
With default level (no **-v** nor **-q** option), output is, as explain in :ref:`formats`::
$ rpg -i QWSDORESDF -e 1
>Input_0_Asp-N_3_3_419.43738_5.54
QWS
>Input_1_Asp-N_8_5_742.78688_4.16
DORES
>Input_2_Asp-N_10_2_280.28048_3.6
DF
Increasing verbosity by one, i.e. using **-v**, leads to add informations about used options. For example::
$ rpg -i QWSDORESDF -e 1 -v
Warning: File 'peptides.fasta' already exit!
Overwrite it? (y/n)
y
Input: QWSDORESDF
Enzyme(s) used: ['Asp-N']
Mode: sequential
miscleavage ratio: [0]
Output file: /Users/nmaillet/Prog/RPG/peptides.fasta
>Input_0_Asp-N_3_3_419.43738_5.54
QWS
>Input_1_Asp-N_8_5_742.78688_4.16
DORES
>Input_2_Asp-N_10_2_280.28048_3.6
DF
Increasing verbosity by two, i.e. using **-vv**, leads to also add statistics about each digested proteins. For example::
$ rpg -i QWSDORESDF -e 1 -vv
Warning: File 'peptides.fasta' already exit!
Overwrite it? (y/n)
y
Input: QWSDORESDF
Enzyme(s) used: ['Asp-N']
Mode: sequential
miscleavage ratio: [0]
Output file: /Users/nmaillet/Prog/RPG/peptides.fasta
Number of cleavage: 2
Cleavage position: 3, 8
Number of miscleavage: 0
miscleavage position:
miscleavage ratio: 0.00%
Smallest peptide size: 2
N terminal peptide: QWS
C terminal peptide: DF
>Input_0_Asp-N_3_3_419.43738_5.54
QWS
>Input_1_Asp-N_8_5_742.78688_4.16
DORES
>Input_2_Asp-N_10_2_280.28048_3.6
DF
Decreasing verbosity, i.e. using **-q** option, remove all informations except errors. For example::
$ rpg -i QWSDORESDF -e 1 -q
Warning: File 'peptides.fasta' already exit!
Overwrite it? (y/n)
y
.. _addenzyme:
Creating a new enzyme
=====================
Todo very sooooon :D
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment