Commit fcff438e by Bertrand NÉRON

Merge branch 'master' of gitlab.pasteur.fr:hub-courses/python_one_week_4_biologists_solutions

parents c1280456 5cfab960
Pipeline #58630 passed with stages
in 34 seconds
 ... ... @@ -351,7 +351,6 @@ Compare the pseudocode of each of them and implement the fastest one. :: acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca""" <<<<<<< HEAD In the first algorithm. | we first compute all kmers we generate 4\ :sup:`kmer length` ... ... @@ -359,15 +358,6 @@ In the first algorithm. | so for each kmer we read all the sequence so the algorithm is in O( 4\ :sup:`kmer length` * ``sequence length``) | In the second algorithm we read the sequence only once ======= In the first algorithm. | we first compute all kmers we generate 4\ :sup:`kmer length` | then we count the occurrence of each kmer in the sequence | so for each kmer we read all the sequence so the algorithm is in O( 4\ :sup:`kmer length` * ``sequence length``) | In the secon algorithm we read the sequence only once >>>>>>> e986fb63db27fe063adb907bfb916dbb79c5db9b | So the algorithm is in O(sequence length) ... ... @@ -546,6 +536,7 @@ and the 2 dna fragments: :: :language: python :: from enzyme_1 import * enzymes = [ecor1, ecor5, bamh1, hind3, taq1, not1, sau3a1, hae3, sma1] ... ...
 ... ... @@ -20,11 +20,11 @@ The Fibonacci sequence are the numbers in the following integer sequence: By definition, the first two numbers in the Fibonacci sequence are 0 and 1, and each subsequent number is the sum of the previous two. The fibonacci suite can be defined as following: The Fibonacci suite can be defined as following: | F\ :sub:`0` = 0, F\ :sub:`1` = 1. | F\ :sub:`0` = 0, F\ :sub:`1` = 1. | | F\ :sub:`n` = F\ :sub:`n-1` + F\ :sub:`n-2` | F\ :sub:`n` = F\ :sub:`n-1` + F\ :sub:`n-2` Write a function which take an integer ``n`` as parameter and returns a list containing the ``n`` first number of the Fibonacci sequence. ... ... @@ -35,7 +35,7 @@ and returns a list containing the ``n`` first number of the Fibonacci sequence. :language: python :download:`fibonacci_iteration.py <_static/code/fibonacci_iteration.py>` . We will see another way more elegant to implement the fibonacci suite in :ref:`Advanced Programming Techniques` section. We will see another way more elegant to implement the Fibonacci suite in :ref:`Advanced Programming Techniques` section. ... ... @@ -66,56 +66,56 @@ implementation def my_max(seq): """ return the maximum value in a sequence return the maximum value in a sequence work only with integer or float """ higest = seq[0] highest = seq[0] for i in seq: if i > highest: highest = i return highest l = [1,2,3,4,58,9] print my_max(l) l = [1, 2, 3, 4, 58, 9] print(my_max(l)) 58 .. _enzyme_exercise: Exercise -------- | We want to establish a restriction map of a sequence. | But we will do this step by step. | and reuse the enzymes used in previous chapter: | We want to establish a restriction map of a sequence. | But we will do this step by step, | and reuse the enzymes used in previous chapter: * create a function that take a sequence and an enzyme as parameter and return the position of first binding sites. (write the pseudocode) * Create a function that takes a sequence and an enzyme as parameters, and returns the position of the first binding site. (Write the pseudocode.) **pseudocode** **pseudocode** | *function one_enz_binding_site(dna, enzyme)* | *if enzyme binding site is substring of dna* | *return of first position of substring in dna* | *return of first position of substring in dna* **implementation** .. literalinclude:: _static/code/restriction.py :linenos: :lines: 1-16 :language: python * improve the previous function to return all positions of binding sites * Improve the previous function to return all positions of binding sites. **pseudocode of first algorithm** | *function one_enz_binding_sites(dna, enzyme)* | *positions <- empty* | *if enzyme binding site is substring of dna* | *add the position of the first substring in dna in positions* | *add the position of the first substring in dna in positions* | *positions <- find binding_sites in rest of dna sequence* | *return positions* | *return positions* **implementation** ... ... @@ -140,21 +140,21 @@ Exercise :linenos: :lines: 34-56 :language: python search all positions of Ecor1 binding sites in dna_1 * Search all positions of Ecor1 binding sites in ``dna_1``. :: ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky") dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga ccaccgtatggatcccaacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg""" * generalize the binding sites function to take a list of enzymes and return a list of tuple (enzyme name, position) * Generalize the binding sites function to take a list of enzymes and return a list of tuples (enzyme name, position). **pseudocode** | *function binding_sites(dna, set of enzymes)* ... ... @@ -167,14 +167,15 @@ search all positions of Ecor1 binding sites in dna_1 **implementation** in bonus we can try to sort the list in the order of the position of the binding sites like this: [('Sau3aI', 38), ('SmaI', 42), ('Sau3aI', 56), ('EcoRI', 75), ... In bonus, we can try to sort the list in the order of the position of the binding sites like this:: [('Sau3aI', 38), ('SmaI', 42), ('Sau3aI', 56), ('EcoRI', 75), ... .. literalinclude:: _static/code/restriction.py :linenos: :lines: 57- :language: python :: ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky") ... ... @@ -187,7 +188,7 @@ in bonus we can try to sort the list in the order of the position of the binding hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt") sma1 = ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt") and the 2 dna fragments: :: and the two dna fragments: :: dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga ... ... @@ -205,7 +206,7 @@ and the 2 dna fragments: :: binding_sites(dna_2, enzymes) [('EcoRI', 11), ('NotI', 33), ('HaeIII', 35), ('EcoRI', 98), ('SmaI', 106), ('EcoRI', 179), ('HaeIII', 193), ('EcoRV', 225)] :download:`restriction.py <_static/code/restriction.py>` . Bonus ... ... @@ -219,23 +220,22 @@ If you prefer the enzyme implemented as namedtuple: Exercise -------- From a list return a new list without any duplicate, but keeping the order of items. For example: :: Write a ``uniqify_with_order`` function that takes a list and returns a new list without any duplicate, but keeping the order of items. For instance:: >>> l = [5,2,3,2,2,3,5,1] >>> l = [5, 2, 3, 2, 2, 3, 5, 1] >>> uniqify_with_order(l) >>> [5,2,3,1] [5, 2, 3, 1] solution :: Solution :: >>> uniq = [] >>> for item in l: >>> if item not in uniq: >>> uniq.append(item) solution :: Solution :: >>> uniq_items = set() >>> l_uniq = [x for x in l if x not in uniq_items and not uniq_items.add(x)]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!