Unverified Commit ea36cda5 authored by Bertrand  NÉRON's avatar Bertrand NÉRON
Browse files

transform exercise on enzyme

use tuple instead of namedtuple
keep nnamedtuple implementation as bonus
parent e0f44684
......@@ -469,20 +469,21 @@ So we can write the reverse complement without loop.
Exercise
--------
let the following enzymes collection: ::
let the following enzymes collection:
We decide to implement enzymes as tuple with the following structure
("name", "comment", "sequence", "cut", "end")
::
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", ("name", "comment", "sequence", "cut", "end"))
ecor1 = RestrictEnzyme("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = RestrictEnzyme("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = RestrictEnzyme("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = RestrictEnzyme("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = RestrictEnzyme("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = RestrictEnzyme("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = RestrictEnzyme("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = ("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = ("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = ("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = ("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = ("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
and the 2 dna fragments: ::
......@@ -499,12 +500,15 @@ and the 2 dna fragments: ::
| the dna_1 but not the dna_2?
#. Write a function *seq_one_line* which take a multi lines sequence and return a sequence in one line.
#. Write a function *enz_filter* which take a sequence and a list of enzymes and return a new list containing
the enzymes which have a binding site in the sequence
#. use the functions above to compute the enzymes which cut the dna_1
apply the same functions to compute the enzymes which cut the dna_2
compute the difference between the enzymes which cut the dna_1 and enzymes which cut the dna_2
* In a file <my_file.py>
#. Write a function *seq_one_line* which take a multi lines sequence and return a sequence in one line.
#. Write a function *enz_filter* which take a sequence and a list of enzymes and return a new list containing
the enzymes which have a binding site in the sequence
#. open a terminal with the command python -i <my_file.py>
#. copy paste the enzymes and dna fragments
#. use the functions above to compute the enzymes which cut the dna_1
apply the same functions to compute the enzymes which cut the dna_2
compute the difference between the enzymes which cut the dna_1 and enzymes which cut the dna_2
.. literalinclude:: _static/code/enzyme_1.py
:linenos:
......@@ -532,8 +536,36 @@ with this algorithm we find if an enzyme cut the dna but we cannot find all cuts
the latter algorithm display the number of occurrence of each enzyme, But we cannot determine the position of every sites.
We will see how to do this later.
Bonus
"""""
There is another kind of tuple which allow to access to itmes by index or name.
This data collection is called NamedTuple. The NamedTuple are not accessible directly they are in `collections` package,
so we have to import it before to use it.
We also have to define which name correspond to which item::
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", ("name", "comment", "sequence", "cut", "end"))
The we can use this new kind of tuple::
ecor1 = RestrictEnzyme("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = RestrictEnzyme("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = RestrictEnzyme("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = RestrictEnzyme("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = RestrictEnzyme("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = RestrictEnzyme("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = RestrictEnzyme("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
The code must be adapted as below
.. literalinclude:: _static/code/enzyme_1_namedtuple.py
:linenos:
:language: python
Exercise
--------
......
......@@ -149,7 +149,7 @@ search all positions of Ecor1 binding sites in dna_1
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", "name comment sequence cut end")
ecor1 = RestrictEnzyme("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag
cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga
......@@ -179,19 +179,16 @@ in bonus we can try to sort the list in the order of the position of the binding
:language: python
::
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", "name comment sequence cut end")
ecor1 = RestrictEnzyme("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = RestrictEnzyme("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = RestrictEnzyme("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = RestrictEnzyme("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = RestrictEnzyme("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = RestrictEnzyme("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = RestrictEnzyme("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = ("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = ("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = ("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = ("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = ("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
and the 2 dna fragments: ::
......@@ -214,6 +211,13 @@ and the 2 dna fragments: ::
:download:`restriction.py <_static/code/restriction.py>` .
Bonus
"""""
If you prefer the enzyme implemeted as namedtuple
:download:`restriction.py <_static/code/restriction_namedtuple.py>` .
Exercise
--------
......
......@@ -6,7 +6,7 @@ def one_line(seq):
def enz_filter(enzymes, dna):
cuting_enz = []
for enz in enzymes:
if enz.sequence in dna:
if enz[2] in dna:
cuting_enz.append(enz)
return cuting_enz
def one_line(seq):
return seq.replace('\n', '')
def enz_filter(enzymes, dna):
cuting_enz = []
for enz in enzymes:
if enz.sequence in dna:
cuting_enz.append(enz)
return cuting_enz
from operator import itemgetter
# we decide to implement enzyme as tuple with the following structure
# ("name", "comment", "sequence", "cut", "end")
# 0 1 2 3 4
def one_enz_one_binding_site(dna, enzyme):
"""
:return: the first position of enzyme binding site in dna or None if there is not
:rtype: int or None
"""
print("one_enz_binding_one_site", dna, enzyme)
pos = dna.find(enzyme.sequence)
print("one_enz_binding_one_site", pos)
pos = dna.find(enzyme[2])
if pos != -1:
return pos
......@@ -23,10 +24,10 @@ def one_enz_all_binding_sites(dna, enzyme):
:rtype: list of int
"""
positions = []
pos = dna.find(enzyme.sequence)
pos = dna.find(enzyme[2])
while pos != -1:
positions.append(pos)
pos = dna.find(enzyme.sequence, pos + 1)
pos = dna.find(enzyme[2], pos + 1)
return positions
......@@ -40,14 +41,14 @@ def one_enz_all_binding_sites2(dna, enzyme):
:rtype: list of int
"""
positions = []
pos = dna.find(enzyme.sequence)
pos = dna.find(enzyme[2])
while pos != -1:
if positions:
positions.append(pos)
else:
positions = pos + positions[-1]
new_seq = dna[pos + 1:]
pos = new_seq.find(enzyme.sequence)
pos = new_seq.find(enzyme[2])
pos = pos
return positions
......@@ -67,7 +68,7 @@ def binding_sites(dna, enzymes):
positions = []
for enzyme in enzymes:
pos = one_enz_all_binding_sites(dna, enzyme)
pos = [(enzyme.name, pos) for pos in pos]
pos = [(enzyme[0], pos) for pos in pos]
positions.extend(pos)
positions.sort(key=itemgetter(1))
return positions
......
##################################################
# Solution using nametuple instead classic tuple #
# see how the code is more readable #
##################################################
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", ("name", "comment", "sequence", "cut", "end"))
ecor1 = RestrictEnzyme("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = RestrictEnzyme("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = RestrictEnzyme("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = RestrictEnzyme("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = RestrictEnzyme("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = RestrictEnzyme("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = RestrictEnzyme("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
from operator import itemgetter
def one_enz_one_binding_site(dna, enzyme):
"""
:return: the first position of enzyme binding site in dna or None if there is not
:rtype: int or None
"""
print("one_enz_binding_one_site", dna, enzyme)
pos = dna.find(enzyme.sequence)
print("one_enz_binding_one_site", pos)
if pos != -1:
return pos
def one_enz_all_binding_sites(dna, enzyme):
"""
:param dna: the dna sequence to search enzyme binding sites
:type dna: str
:param enzyme: the enzyme to looking for
:type enzyme: a namedtuple RestrictEnzyme
:return: all positions of enzyme binding sites in dna
:rtype: list of int
"""
positions = []
pos = dna.find(enzyme.sequence)
while pos != -1:
positions.append(pos)
pos = dna.find(enzyme.sequence, pos + 1)
return positions
def one_enz_all_binding_sites2(dna, enzyme):
"""
:param dna: the dna sequence to search enzyme binding sites
:type dna: str
:param enzyme: the enzyme to looking for
:type enzyme: a namedtuple RestrictEnzyme
:return: all positions of enzyme binding sites in dna
:rtype: list of int
"""
positions = []
pos = dna.find(enzyme.sequence)
while pos != -1:
if positions:
positions.append(pos)
else:
positions = pos + positions[-1]
new_seq = dna[pos + 1:]
pos = new_seq.find(enzyme.sequence)
pos = pos
return positions
def binding_sites(dna, enzymes):
"""
return all positions of all enzymes binding sites present in dna
sort by the increasing position.
:param dna: the dna sequence to search enzyme binding sites
:type dna: str
:param enzyme: the enzyme to looking for
:type enzyme: a namedtuple RestrictEnzyme
:return: all positions of each enzyme binding sites in dna
:rtype: list of int
"""
positions = []
for enzyme in enzymes:
pos = one_enz_all_binding_sites(dna, enzyme)
pos = [(enzyme.name, pos) for pos in pos]
positions.extend(pos)
positions.sort(key=itemgetter(1))
return positions
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment