Unverified Commit fcff438e authored by Bertrand  NÉRON's avatar Bertrand NÉRON
Browse files

Merge branch 'master' of gitlab.pasteur.fr:hub-courses/python_one_week_4_biologists_solutions

parents c1280456 5cfab960
Pipeline #58630 passed with stages
in 34 seconds
...@@ -351,7 +351,6 @@ Compare the pseudocode of each of them and implement the fastest one. :: ...@@ -351,7 +351,6 @@ Compare the pseudocode of each of them and implement the fastest one. ::
acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca""" acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca"""
<<<<<<< HEAD
In the first algorithm. In the first algorithm.
| we first compute all kmers we generate 4\ :sup:`kmer length` | we first compute all kmers we generate 4\ :sup:`kmer length`
...@@ -359,15 +358,6 @@ In the first algorithm. ...@@ -359,15 +358,6 @@ In the first algorithm.
| so for each kmer we read all the sequence so the algorithm is in O( 4\ :sup:`kmer length` * ``sequence length``) | so for each kmer we read all the sequence so the algorithm is in O( 4\ :sup:`kmer length` * ``sequence length``)
| In the second algorithm we read the sequence only once | In the second algorithm we read the sequence only once
=======
In the first algorithm.
| we first compute all kmers we generate 4\ :sup:`kmer length`
| then we count the occurrence of each kmer in the sequence
| so for each kmer we read all the sequence so the algorithm is in O( 4\ :sup:`kmer length` * ``sequence length``)
| In the secon algorithm we read the sequence only once
>>>>>>> e986fb63db27fe063adb907bfb916dbb79c5db9b
| So the algorithm is in O(sequence length) | So the algorithm is in O(sequence length)
...@@ -546,6 +536,7 @@ and the 2 dna fragments: :: ...@@ -546,6 +536,7 @@ and the 2 dna fragments: ::
:language: python :language: python
:: ::
from enzyme_1 import * from enzyme_1 import *
enzymes = [ecor1, ecor5, bamh1, hind3, taq1, not1, sau3a1, hae3, sma1] enzymes = [ecor1, ecor5, bamh1, hind3, taq1, not1, sau3a1, hae3, sma1]
......
...@@ -20,11 +20,11 @@ The Fibonacci sequence are the numbers in the following integer sequence: ...@@ -20,11 +20,11 @@ The Fibonacci sequence are the numbers in the following integer sequence:
By definition, the first two numbers in the Fibonacci sequence are 0 and 1, By definition, the first two numbers in the Fibonacci sequence are 0 and 1,
and each subsequent number is the sum of the previous two. and each subsequent number is the sum of the previous two.
The fibonacci suite can be defined as following: The Fibonacci suite can be defined as following:
| F\ :sub:`0` = 0, F\ :sub:`1` = 1. | F\ :sub:`0` = 0, F\ :sub:`1` = 1.
| |
| F\ :sub:`n` = F\ :sub:`n-1` + F\ :sub:`n-2` | F\ :sub:`n` = F\ :sub:`n-1` + F\ :sub:`n-2`
Write a function which take an integer ``n`` as parameter Write a function which take an integer ``n`` as parameter
and returns a list containing the ``n`` first number of the Fibonacci sequence. and returns a list containing the ``n`` first number of the Fibonacci sequence.
...@@ -35,7 +35,7 @@ and returns a list containing the ``n`` first number of the Fibonacci sequence. ...@@ -35,7 +35,7 @@ and returns a list containing the ``n`` first number of the Fibonacci sequence.
:language: python :language: python
:download:`fibonacci_iteration.py <_static/code/fibonacci_iteration.py>` . :download:`fibonacci_iteration.py <_static/code/fibonacci_iteration.py>` .
We will see another way more elegant to implement the fibonacci suite in :ref:`Advanced Programming Techniques` section. We will see another way more elegant to implement the Fibonacci suite in :ref:`Advanced Programming Techniques` section.
...@@ -66,56 +66,56 @@ implementation ...@@ -66,56 +66,56 @@ implementation
def my_max(seq): def my_max(seq):
""" """
return the maximum value in a sequence return the maximum value in a sequence
work only with integer or float work only with integer or float
""" """
higest = seq[0] highest = seq[0]
for i in seq: for i in seq:
if i > highest: if i > highest:
highest = i highest = i
return highest return highest
l = [1,2,3,4,58,9] l = [1, 2, 3, 4, 58, 9]
print my_max(l) print(my_max(l))
58 58
.. _enzyme_exercise: .. _enzyme_exercise:
Exercise Exercise
-------- --------
| We want to establish a restriction map of a sequence. | We want to establish a restriction map of a sequence.
| But we will do this step by step. | But we will do this step by step,
| and reuse the enzymes used in previous chapter: | and reuse the enzymes used in previous chapter:
* create a function that take a sequence and an enzyme as parameter and return * Create a function that takes a sequence and an enzyme as parameters, and returns
the position of first binding sites. the position of the first binding site.
(write the pseudocode) (Write the pseudocode.)
**pseudocode**
**pseudocode**
| *function one_enz_binding_site(dna, enzyme)* | *function one_enz_binding_site(dna, enzyme)*
| *if enzyme binding site is substring of dna* | *if enzyme binding site is substring of dna*
| *return of first position of substring in dna* | *return of first position of substring in dna*
**implementation** **implementation**
.. literalinclude:: _static/code/restriction.py .. literalinclude:: _static/code/restriction.py
:linenos: :linenos:
:lines: 1-16 :lines: 1-16
:language: python :language: python
* improve the previous function to return all positions of binding sites * Improve the previous function to return all positions of binding sites.
**pseudocode of first algorithm** **pseudocode of first algorithm**
| *function one_enz_binding_sites(dna, enzyme)* | *function one_enz_binding_sites(dna, enzyme)*
| *positions <- empty* | *positions <- empty*
| *if enzyme binding site is substring of dna* | *if enzyme binding site is substring of dna*
| *add the position of the first substring in dna in positions* | *add the position of the first substring in dna in positions*
| *positions <- find binding_sites in rest of dna sequence* | *positions <- find binding_sites in rest of dna sequence*
| *return positions* | *return positions*
**implementation** **implementation**
...@@ -140,21 +140,21 @@ Exercise ...@@ -140,21 +140,21 @@ Exercise
:linenos: :linenos:
:lines: 34-56 :lines: 34-56
:language: python :language: python
search all positions of Ecor1 binding sites in dna_1 * Search all positions of Ecor1 binding sites in ``dna_1``.
:: ::
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky") ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag
cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga
ccaccgtatggatcccaacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg""" ccaccgtatggatcccaacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg"""
* generalize the binding sites function to take a list of enzymes and return a list of tuple (enzyme name, position) * Generalize the binding sites function to take a list of enzymes and return a list of tuples (enzyme name, position).
**pseudocode** **pseudocode**
| *function binding_sites(dna, set of enzymes)* | *function binding_sites(dna, set of enzymes)*
...@@ -167,14 +167,15 @@ search all positions of Ecor1 binding sites in dna_1 ...@@ -167,14 +167,15 @@ search all positions of Ecor1 binding sites in dna_1
**implementation** **implementation**
in bonus we can try to sort the list in the order of the position of the binding sites like this: In bonus, we can try to sort the list in the order of the position of the binding sites like this::
[('Sau3aI', 38), ('SmaI', 42), ('Sau3aI', 56), ('EcoRI', 75), ...
[('Sau3aI', 38), ('SmaI', 42), ('Sau3aI', 56), ('EcoRI', 75), ...
.. literalinclude:: _static/code/restriction.py .. literalinclude:: _static/code/restriction.py
:linenos: :linenos:
:lines: 57- :lines: 57-
:language: python :language: python
:: ::
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky") ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
...@@ -187,7 +188,7 @@ in bonus we can try to sort the list in the order of the position of the binding ...@@ -187,7 +188,7 @@ in bonus we can try to sort the list in the order of the position of the binding
hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt") hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt") sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
and the 2 dna fragments: :: and the two dna fragments: ::
dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag
cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga
...@@ -205,7 +206,7 @@ and the 2 dna fragments: :: ...@@ -205,7 +206,7 @@ and the 2 dna fragments: ::
binding_sites(dna_2, enzymes) binding_sites(dna_2, enzymes)
[('EcoRI', 11), ('NotI', 33), ('HaeIII', 35), ('EcoRI', 98), ('SmaI', 106), [('EcoRI', 11), ('NotI', 33), ('HaeIII', 35), ('EcoRI', 98), ('SmaI', 106),
('EcoRI', 179), ('HaeIII', 193), ('EcoRV', 225)] ('EcoRI', 179), ('HaeIII', 193), ('EcoRV', 225)]
:download:`restriction.py <_static/code/restriction.py>` . :download:`restriction.py <_static/code/restriction.py>` .
Bonus Bonus
...@@ -219,23 +220,22 @@ If you prefer the enzyme implemented as namedtuple: ...@@ -219,23 +220,22 @@ If you prefer the enzyme implemented as namedtuple:
Exercise Exercise
-------- --------
From a list return a new list without any duplicate, but keeping the order of items. Write a ``uniqify_with_order`` function that takes a list and returns a new list without any duplicate, but keeping the order of items.
For example: :: For instance::
>>> l = [5,2,3,2,2,3,5,1] >>> l = [5, 2, 3, 2, 2, 3, 5, 1]
>>> uniqify_with_order(l) >>> uniqify_with_order(l)
>>> [5,2,3,1] [5, 2, 3, 1]
solution :: Solution ::
>>> uniq = [] >>> uniq = []
>>> for item in l: >>> for item in l:
>>> if item not in uniq: >>> if item not in uniq:
>>> uniq.append(item) >>> uniq.append(item)
solution :: Solution ::
>>> uniq_items = set() >>> uniq_items = set()
>>> l_uniq = [x for x in l if x not in uniq_items and not uniq_items.add(x)] >>> l_uniq = [x for x in l if x not in uniq_items and not uniq_items.add(x)]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment