Skip to content
Snippets Groups Projects
Commit e005e4ef authored by Julien  GUGLIELMINI's avatar Julien GUGLIELMINI
Browse files

Readme update

parent 28dd8755
No related branches found
No related tags found
No related merge requests found
......@@ -94,15 +94,16 @@ The first two columns indicate the pair of elements for this row.
wGRR1 is the easiest way of calculating the wGRR. It corresponds to the sum of the identity scores for all BBH between the two elements, divided by the smallest total number of protein of the two elements (columns RealLengthA and RealLengthB).
Common1 is the proportion of common proteins, _i.e._ the number of BBH pairs divided by the same denominator as the wGRR.
Common1 is the proportion of common proteins, _i.e._ the number of BBH pairs divided by the mean number of proteins of the two elements.
wGRR2 uses the same numerator than wGRR1. For the denominator calculation, an extra single linkage clustering step is perfomed to build protein families. If a protein belongs to a family containing a BBH, but this protein is not part of a BBH, then it is not counted in the denominator, resulting sometimes in a greater wGRR. This is can be useful when lots of paralogs are expected - the presence of paralogs will artificially lower the wGRR as calculated for wGRR1.
wGRR2 uses the same numerator than wGRR1. For the denominator calculation, an extra single linkage clustering step is perfomed to build protein families. If a protein belongs to a family containing a BBH but this protein is not part of a BBH, then it is not counted in the denominator, resulting sometimes in a greater wGRR. This results in a new number of proteins (nProt2) per element, and the denominator is the smallest nProt2.
wGRR2 is especially interesting when lots of paralogs are expected - the presence of paralogs will artificially lower the wGRR1.
Common2 is the number of BBH pairs divided by the denominator of wGRR2.
Common2 is the number of BBH pairs divided by the mean nProt2 of two elements.
wGRR3 also uses the protein families. But this time, if two BBH pairs are found in the same family, only one will be counted (the one with the highest identity score) for the numerator. The denominator is the number of protein families + the number of proteins that are not part of a family.
wGRR3 also uses the protein families. But this time, if two BBH pairs are found in the same family, only one will be counted (the one with the highest identity score) for the numerator. nProt3 is the number of protein families + the number of proteins that are not part of a family. The denominator is the smallest nProt3.
Common3 is the number of protein families containing at least one BBH divided by the denominator of wGRR3.
Common3 is the number of protein families containing at least one BBH divided by the mean nProt3 of two elements.
![wgrr.png](https://gitlab.pasteur.fr/jgugliel/wgrr/-/raw/main/wgrr.png)
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment