Unverified Commit fcff438e authored by Bertrand  NÉRON's avatar Bertrand NÉRON
Browse files

Merge branch 'master' of gitlab.pasteur.fr:hub-courses/python_one_week_4_biologists_solutions

parents c1280456 5cfab960
Pipeline #58630 passed with stages
in 34 seconds
......@@ -351,7 +351,6 @@ Compare the pseudocode of each of them and implement the fastest one. ::
acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca"""
<<<<<<< HEAD
In the first algorithm.
| we first compute all kmers we generate 4\ :sup:`kmer length`
......@@ -359,15 +358,6 @@ In the first algorithm.
| so for each kmer we read all the sequence so the algorithm is in O( 4\ :sup:`kmer length` * ``sequence length``)
| In the second algorithm we read the sequence only once
=======
In the first algorithm.
| we first compute all kmers we generate 4\ :sup:`kmer length`
| then we count the occurrence of each kmer in the sequence
| so for each kmer we read all the sequence so the algorithm is in O( 4\ :sup:`kmer length` * ``sequence length``)
| In the secon algorithm we read the sequence only once
>>>>>>> e986fb63db27fe063adb907bfb916dbb79c5db9b
| So the algorithm is in O(sequence length)
......@@ -546,6 +536,7 @@ and the 2 dna fragments: ::
:language: python
::
from enzyme_1 import *
enzymes = [ecor1, ecor5, bamh1, hind3, taq1, not1, sau3a1, hae3, sma1]
......
......@@ -20,11 +20,11 @@ The Fibonacci sequence are the numbers in the following integer sequence:
By definition, the first two numbers in the Fibonacci sequence are 0 and 1,
and each subsequent number is the sum of the previous two.
The fibonacci suite can be defined as following:
The Fibonacci suite can be defined as following:
| F\ :sub:`0` = 0, F\ :sub:`1` = 1.
| F\ :sub:`0` = 0, F\ :sub:`1` = 1.
|
| F\ :sub:`n` = F\ :sub:`n-1` + F\ :sub:`n-2`
| F\ :sub:`n` = F\ :sub:`n-1` + F\ :sub:`n-2`
Write a function which take an integer ``n`` as parameter
and returns a list containing the ``n`` first number of the Fibonacci sequence.
......@@ -35,7 +35,7 @@ and returns a list containing the ``n`` first number of the Fibonacci sequence.
:language: python
:download:`fibonacci_iteration.py <_static/code/fibonacci_iteration.py>` .
We will see another way more elegant to implement the fibonacci suite in :ref:`Advanced Programming Techniques` section.
We will see another way more elegant to implement the Fibonacci suite in :ref:`Advanced Programming Techniques` section.
......@@ -66,56 +66,56 @@ implementation
def my_max(seq):
"""
return the maximum value in a sequence
return the maximum value in a sequence
work only with integer or float
"""
higest = seq[0]
highest = seq[0]
for i in seq:
if i > highest:
highest = i
return highest
l = [1,2,3,4,58,9]
print my_max(l)
l = [1, 2, 3, 4, 58, 9]
print(my_max(l))
58
.. _enzyme_exercise:
Exercise
--------
| We want to establish a restriction map of a sequence.
| But we will do this step by step.
| and reuse the enzymes used in previous chapter:
| We want to establish a restriction map of a sequence.
| But we will do this step by step,
| and reuse the enzymes used in previous chapter:
* create a function that take a sequence and an enzyme as parameter and return
the position of first binding sites.
(write the pseudocode)
* Create a function that takes a sequence and an enzyme as parameters, and returns
the position of the first binding site.
(Write the pseudocode.)
**pseudocode**
**pseudocode**
| *function one_enz_binding_site(dna, enzyme)*
| *if enzyme binding site is substring of dna*
| *return of first position of substring in dna*
| *return of first position of substring in dna*
**implementation**
.. literalinclude:: _static/code/restriction.py
:linenos:
:lines: 1-16
:language: python
* improve the previous function to return all positions of binding sites
* Improve the previous function to return all positions of binding sites.
**pseudocode of first algorithm**
| *function one_enz_binding_sites(dna, enzyme)*
| *positions <- empty*
| *if enzyme binding site is substring of dna*
| *add the position of the first substring in dna in positions*
| *add the position of the first substring in dna in positions*
| *positions <- find binding_sites in rest of dna sequence*
| *return positions*
| *return positions*
**implementation**
......@@ -140,21 +140,21 @@ Exercise
:linenos:
:lines: 34-56
:language: python
search all positions of Ecor1 binding sites in dna_1
* Search all positions of Ecor1 binding sites in ``dna_1``.
::
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag
cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga
ccaccgtatggatcccaacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg"""
* generalize the binding sites function to take a list of enzymes and return a list of tuple (enzyme name, position)
* Generalize the binding sites function to take a list of enzymes and return a list of tuples (enzyme name, position).
**pseudocode**
| *function binding_sites(dna, set of enzymes)*
......@@ -167,14 +167,15 @@ search all positions of Ecor1 binding sites in dna_1
**implementation**
in bonus we can try to sort the list in the order of the position of the binding sites like this:
[('Sau3aI', 38), ('SmaI', 42), ('Sau3aI', 56), ('EcoRI', 75), ...
In bonus, we can try to sort the list in the order of the position of the binding sites like this::
[('Sau3aI', 38), ('SmaI', 42), ('Sau3aI', 56), ('EcoRI', 75), ...
.. literalinclude:: _static/code/restriction.py
:linenos:
:lines: 57-
:language: python
::
ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
......@@ -187,7 +188,7 @@ in bonus we can try to sort the list in the order of the position of the binding
hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
and the 2 dna fragments: ::
and the two dna fragments: ::
dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag
cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga
......@@ -205,7 +206,7 @@ and the 2 dna fragments: ::
binding_sites(dna_2, enzymes)
[('EcoRI', 11), ('NotI', 33), ('HaeIII', 35), ('EcoRI', 98), ('SmaI', 106),
('EcoRI', 179), ('HaeIII', 193), ('EcoRV', 225)]
:download:`restriction.py <_static/code/restriction.py>` .
Bonus
......@@ -219,23 +220,22 @@ If you prefer the enzyme implemented as namedtuple:
Exercise
--------
From a list return a new list without any duplicate, but keeping the order of items.
For example: ::
Write a ``uniqify_with_order`` function that takes a list and returns a new list without any duplicate, but keeping the order of items.
For instance::
>>> l = [5,2,3,2,2,3,5,1]
>>> l = [5, 2, 3, 2, 2, 3, 5, 1]
>>> uniqify_with_order(l)
>>> [5,2,3,1]
[5, 2, 3, 1]
solution ::
Solution ::
>>> uniq = []
>>> for item in l:
>>> if item not in uniq:
>>> uniq.append(item)
solution ::
Solution ::
>>> uniq_items = set()
>>> l_uniq = [x for x in l if x not in uniq_items and not uniq_items.add(x)]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment