Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
hub-courses
python_one_week_4_biologists_solutions
Commits
ea36cda5
Unverified
Commit
ea36cda5
authored
Jun 06, 2021
by
Bertrand NÉRON
Browse files
transform exercise on enzyme
use tuple instead of namedtuple keep nnamedtuple implementation as bonus
parent
e0f44684
Changes
6
Hide whitespace changes
Inline
Side-by-side
source/Collection_Data_Types.rst
View file @
ea36cda5
...
...
@@ -469,20 +469,21 @@ So we can write the reverse complement without loop.
Exercise
--------
let the following enzymes collection: ::
let the following enzymes collection:
We decide to implement enzymes as tuple with the following structure
("name", "comment", "sequence", "cut", "end")
::
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", ("name", "comment", "sequence", "cut", "end"))
ecor1 = RestrictEnzyme("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = RestrictEnzyme("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = RestrictEnzyme("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = RestrictEnzyme("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = RestrictEnzyme("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = RestrictEnzyme("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = RestrictEnzyme("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = ("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = ("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = ("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = ("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = ("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
and the 2 dna fragments: ::
...
...
@@ -499,12 +500,15 @@ and the 2 dna fragments: ::
| the dna_1 but not the dna_2?
#. Write a function *seq_one_line* which take a multi lines sequence and return a sequence in one line.
#. Write a function *enz_filter* which take a sequence and a list of enzymes and return a new list containing
the enzymes which have a binding site in the sequence
#. use the functions above to compute the enzymes which cut the dna_1
apply the same functions to compute the enzymes which cut the dna_2
compute the difference between the enzymes which cut the dna_1 and enzymes which cut the dna_2
* In a file <my_file.py>
#. Write a function *seq_one_line* which take a multi lines sequence and return a sequence in one line.
#. Write a function *enz_filter* which take a sequence and a list of enzymes and return a new list containing
the enzymes which have a binding site in the sequence
#. open a terminal with the command python -i <my_file.py>
#. copy paste the enzymes and dna fragments
#. use the functions above to compute the enzymes which cut the dna_1
apply the same functions to compute the enzymes which cut the dna_2
compute the difference between the enzymes which cut the dna_1 and enzymes which cut the dna_2
.. literalinclude:: _static/code/enzyme_1.py
:linenos:
...
...
@@ -532,8 +536,36 @@ with this algorithm we find if an enzyme cut the dna but we cannot find all cuts
the latter algorithm display the number of occurrence of each enzyme, But we cannot determine the position of every sites.
We will see how to do this later.
Bonus
"""""
There is another kind of tuple which allow to access to itmes by index or name.
This data collection is called NamedTuple. The NamedTuple are not accessible directly they are in `collections` package,
so we have to import it before to use it.
We also have to define which name correspond to which item::
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", ("name", "comment", "sequence", "cut", "end"))
The we can use this new kind of tuple::
ecor1 = RestrictEnzyme("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = RestrictEnzyme("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = RestrictEnzyme("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = RestrictEnzyme("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = RestrictEnzyme("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = RestrictEnzyme("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = RestrictEnzyme("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
The code must be adapted as below
.. literalinclude:: _static/code/enzyme_1_namedtuple.py
:linenos:
:language: python
Exercise
--------
...
...
source/Control_Flow_Statements.rst
View file @
ea36cda5
...
...
@@ -149,7 +149,7 @@ search all positions of Ecor1 binding sites in dna_1
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", "name comment sequence cut end")
ecor1 =
RestrictEnzyme
("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag
cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga
...
...
@@ -179,19 +179,16 @@ in bonus we can try to sort the list in the order of the position of the binding
:language: python
::
import collections
RestrictEnzyme = collections.namedtuple("RestrictEnzyme", "name comment sequence cut end")
ecor1 =
RestrictEnzyme
("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 =
RestrictEnzyme
("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 =
RestrictEnzyme
("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 =
RestrictEnzyme
("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 =
RestrictEnzyme
("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 =
RestrictEnzyme
("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 =
RestrictEnzyme
("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 =
RestrictEnzyme
("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 =
RestrictEnzyme
("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
ecor5 = ("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
bamh1 = ("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
taq1 = ("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
not1 = ("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
sau3a1 = ("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
and the 2 dna fragments: ::
...
...
@@ -214,6 +211,13 @@ and the 2 dna fragments: ::
:download:`restriction.py <_static/code/restriction.py>` .
Bonus
"""""
If you prefer the enzyme implemeted as namedtuple
:download:`restriction.py <_static/code/restriction_namedtuple.py>` .
Exercise
--------
...
...
source/_static/code/enzyme_1.py
View file @
ea36cda5
...
...
@@ -6,7 +6,7 @@ def one_line(seq):
def
enz_filter
(
enzymes
,
dna
):
cuting_enz
=
[]
for
enz
in
enzymes
:
if
enz
.
sequence
in
dna
:
if
enz
[
2
]
in
dna
:
cuting_enz
.
append
(
enz
)
return
cuting_enz
source/_static/code/enzyme_1_namedtuple.py
0 → 100644
View file @
ea36cda5
def
one_line
(
seq
):
return
seq
.
replace
(
'
\n
'
,
''
)
def
enz_filter
(
enzymes
,
dna
):
cuting_enz
=
[]
for
enz
in
enzymes
:
if
enz
.
sequence
in
dna
:
cuting_enz
.
append
(
enz
)
return
cuting_enz
source/_static/code/restriction.py
View file @
ea36cda5
from
operator
import
itemgetter
# we decide to implement enzyme as tuple with the following structure
# ("name", "comment", "sequence", "cut", "end")
# 0 1 2 3 4
def
one_enz_one_binding_site
(
dna
,
enzyme
):
"""
:return: the first position of enzyme binding site in dna or None if there is not
:rtype: int or None
"""
print
(
"one_enz_binding_one_site"
,
dna
,
enzyme
)
pos
=
dna
.
find
(
enzyme
.
sequence
)
print
(
"one_enz_binding_one_site"
,
pos
)
pos
=
dna
.
find
(
enzyme
[
2
])
if
pos
!=
-
1
:
return
pos
...
...
@@ -23,10 +24,10 @@ def one_enz_all_binding_sites(dna, enzyme):
:rtype: list of int
"""
positions
=
[]
pos
=
dna
.
find
(
enzyme
.
sequence
)
pos
=
dna
.
find
(
enzyme
[
2
]
)
while
pos
!=
-
1
:
positions
.
append
(
pos
)
pos
=
dna
.
find
(
enzyme
.
sequence
,
pos
+
1
)
pos
=
dna
.
find
(
enzyme
[
2
]
,
pos
+
1
)
return
positions
...
...
@@ -40,14 +41,14 @@ def one_enz_all_binding_sites2(dna, enzyme):
:rtype: list of int
"""
positions
=
[]
pos
=
dna
.
find
(
enzyme
.
sequence
)
pos
=
dna
.
find
(
enzyme
[
2
]
)
while
pos
!=
-
1
:
if
positions
:
positions
.
append
(
pos
)
else
:
positions
=
pos
+
positions
[
-
1
]
new_seq
=
dna
[
pos
+
1
:]
pos
=
new_seq
.
find
(
enzyme
.
sequence
)
pos
=
new_seq
.
find
(
enzyme
[
2
]
)
pos
=
pos
return
positions
...
...
@@ -67,7 +68,7 @@ def binding_sites(dna, enzymes):
positions
=
[]
for
enzyme
in
enzymes
:
pos
=
one_enz_all_binding_sites
(
dna
,
enzyme
)
pos
=
[(
enzyme
.
name
,
pos
)
for
pos
in
pos
]
pos
=
[(
enzyme
[
0
]
,
pos
)
for
pos
in
pos
]
positions
.
extend
(
pos
)
positions
.
sort
(
key
=
itemgetter
(
1
))
return
positions
...
...
source/_static/code/restriction_namedtuple.py
0 → 100644
View file @
ea36cda5
##################################################
# Solution using nametuple instead classic tuple #
# see how the code is more readable #
##################################################
import
collections
RestrictEnzyme
=
collections
.
namedtuple
(
"RestrictEnzyme"
,
(
"name"
,
"comment"
,
"sequence"
,
"cut"
,
"end"
))
ecor1
=
RestrictEnzyme
(
"EcoRI"
,
"Ecoli restriction enzime I"
,
"gaattc"
,
1
,
"sticky"
)
ecor5
=
RestrictEnzyme
(
"EcoRV"
,
"Ecoli restriction enzime V"
,
"gatatc"
,
3
,
"blunt"
)
bamh1
=
RestrictEnzyme
(
"BamHI"
,
"type II restriction endonuclease from Bacillus amyloliquefaciens "
,
"ggatcc"
,
1
,
"sticky"
)
hind3
=
RestrictEnzyme
(
"HindIII"
,
"type II site-specific nuclease from Haemophilus influenzae"
,
"aagctt"
,
1
,
"sticky"
)
taq1
=
RestrictEnzyme
(
"TaqI"
,
"Thermus aquaticus"
,
"tcga"
,
1
,
"sticky"
)
not1
=
RestrictEnzyme
(
"NotI"
,
"Nocardia otitidis"
,
"gcggccgc"
,
2
,
"sticky"
)
sau3a1
=
RestrictEnzyme
(
"Sau3aI"
,
"Staphylococcus aureus"
,
"gatc"
,
0
,
"sticky"
)
hae3
=
RestrictEnzyme
(
"HaeIII"
,
"Haemophilus aegyptius"
,
"ggcc"
,
2
,
"blunt"
)
sma1
=
RestrictEnzyme
(
"SmaI"
,
"Serratia marcescens"
,
"cccggg"
,
3
,
"blunt"
)
from
operator
import
itemgetter
def
one_enz_one_binding_site
(
dna
,
enzyme
):
"""
:return: the first position of enzyme binding site in dna or None if there is not
:rtype: int or None
"""
print
(
"one_enz_binding_one_site"
,
dna
,
enzyme
)
pos
=
dna
.
find
(
enzyme
.
sequence
)
print
(
"one_enz_binding_one_site"
,
pos
)
if
pos
!=
-
1
:
return
pos
def
one_enz_all_binding_sites
(
dna
,
enzyme
):
"""
:param dna: the dna sequence to search enzyme binding sites
:type dna: str
:param enzyme: the enzyme to looking for
:type enzyme: a namedtuple RestrictEnzyme
:return: all positions of enzyme binding sites in dna
:rtype: list of int
"""
positions
=
[]
pos
=
dna
.
find
(
enzyme
.
sequence
)
while
pos
!=
-
1
:
positions
.
append
(
pos
)
pos
=
dna
.
find
(
enzyme
.
sequence
,
pos
+
1
)
return
positions
def
one_enz_all_binding_sites2
(
dna
,
enzyme
):
"""
:param dna: the dna sequence to search enzyme binding sites
:type dna: str
:param enzyme: the enzyme to looking for
:type enzyme: a namedtuple RestrictEnzyme
:return: all positions of enzyme binding sites in dna
:rtype: list of int
"""
positions
=
[]
pos
=
dna
.
find
(
enzyme
.
sequence
)
while
pos
!=
-
1
:
if
positions
:
positions
.
append
(
pos
)
else
:
positions
=
pos
+
positions
[
-
1
]
new_seq
=
dna
[
pos
+
1
:]
pos
=
new_seq
.
find
(
enzyme
.
sequence
)
pos
=
pos
return
positions
def
binding_sites
(
dna
,
enzymes
):
"""
return all positions of all enzymes binding sites present in dna
sort by the increasing position.
:param dna: the dna sequence to search enzyme binding sites
:type dna: str
:param enzyme: the enzyme to looking for
:type enzyme: a namedtuple RestrictEnzyme
:return: all positions of each enzyme binding sites in dna
:rtype: list of int
"""
positions
=
[]
for
enzyme
in
enzymes
:
pos
=
one_enz_all_binding_sites
(
dna
,
enzyme
)
pos
=
[(
enzyme
.
name
,
pos
)
for
pos
in
pos
]
positions
.
extend
(
pos
)
positions
.
sort
(
key
=
itemgetter
(
1
))
return
positions
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment