Collection_Data_Types.rst 19.1 KB
Newer Older
1
2
3
.. sectnum::
   :start: 5
   
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
4
5
6
7
8
.. _Collection_Data_types:

*********************
Collection Data Types
*********************
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
9

10
Exercises
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
11
12
=========

13
Exercise
14
15
--------

16
17
| Draw the representation in memory of the following expressions.
| what is the data type of each object?
18
19
20
21
22
23
24
25

::   

   x = [1, 2, 3, 4]
   y = x[1]
   y = 3.14
   x[1] = 'foo'
   
26
27
28
29
30
.. figure:: _static/figs/list_1.png
   :width: 400px
   :alt: set
   :figclass: align-center
   
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
::

   x = [1, 2, 3, 4]
   x += [5, 6]

.. figure:: _static/figs/augmented_assignment_list.png  
   :width: 400px
   :alt: set
   :figclass: align-center 

::

   >>> x = [1, 2, 3, 4]
   >>> id(x)
   139950507563632
   >>> x += [5,6]
   >>> id(x)
   139950507563632
   
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
50
51
With mutable object like ``list`` when we mutate the object the state of the object is modified.
But the reference to the object is still unchanged.
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
52
So in this example we have two ways to access to the list [1,2] if we modify the state of the list itself.
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
53
54
but not the references to this object, then the 2 variables x and y still reference the list containing
[1,2,3,4]. 
55

56
57
58
compare with the exercise on string and integers:

Since list are mutable, when ``+=`` is used the original list object is modified, so no rebinding of *x* is necessary.
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
59
We can observe this using *id()* which give the memory address of an object. This address does not change after the
60
61
62
``+=`` operation.

.. note::
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
63
   even the results is the same there is a subtelty to use augmented operator.
64
65
66
   in ``a operator= b`` python looks up ``a`` ’s value only once, so it is potentially faster
   than the ``a = a operator b``.

67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

compare ::

   x = 3
   y = x
   y += 3
   x = ?
   y = ?
   
   
.. figure:: _static/figs/augmented_assignment_int2.png  
   :width: 400px
   :alt: augmented_assignment
   :figclass: align-center 

   
and ::

   x = [1,2]
   y = x
   y += [3,4]
   x = ?
   y = ?  


.. figure:: _static/figs/augmented_assignment_list2.png  
   :width: 400px
   :alt: list extend
   :figclass: align-center 



99
Exercise
100
101
102
103
104
105
106
107
--------

wihout using python shell, what is the results of the following statements:  
 
.. note:: 
   sum is a function which return the sum of each elements of a list.
      
::
108

109
110
111
112
   x = [1, 2, 3, 4]
   x[3] = -4 # what is the value of x now ?
   y = sum(x)/len(x) #what is the value of y ? why ?
   
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
113
114
   y = 0.5
.. warning::
115

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
116
117
118
    In python2 the result is ::

        y = 0
119

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
120
121
    because sum(x) is an integer, len(x) is also an integer so in python2.x the result is an integer,
    all the digits after the periods are discarded.
122
123


124
Exercise
125
126
--------

127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
Draw the representation in memory of the following expressions. ::

   x = [1, ['a','b','c'], 3, 4]
   y = x[1]
   y[2] = 'z'
   # what is the value of x ?
   
.. figure:: _static/figs/list_2-1.png
   :width: 400px
   :alt: set
   :figclass: align-center
   

.. container:: clearer

    .. image :: _static/figs/spacer.png
       
 When we execute *y = x[1]*, we create ``y`` wich reference the list ``['a', 'b', 'c']``.
 This list has 2 references on it: ``y`` and ``x[1]`` .
   
   
.. figure:: _static/figs/list_2-2.png
   :width: 400px
   :alt: set
   :figclass: align-center
 
   
.. container:: clearer

    .. image :: _static/figs/spacer.png
       
   
 This object is a list so it is a mutable object.
 So we can access **and** modify it by the two ways ``y`` or ``x[1]`` ::
 
   x = [1, ['a','b','z'], 3, 4]
    
164
165
166
167
168
169
170
171
172
Exercise
--------

from the list l = [1, 2, 3, 4, 5, 6, 7, 8, 9] generate 2 lists l1 containing all odd values, and l2 all even values.::

   l = [1, 2, 3, 4, 5, 6, 7, 8, 9]
   l1 = l[::2]
   l2 = l[1::2]

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
173
174
175
176
177
or ::

    even = [item for item in l if item % 2 == 0]
    odd = [item for item in l if item % 2 != 0]

178
Exercise
179
180
--------
   
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
generate a list containing all codons.
   
pseudocode:
"""""""""""

| *function all_codons()*
|     *all_codons <- empty list*
|     *let varying the first base*
|     *for each first base let varying the second base*
|     *for each combination first base, second base let varying the third base*
|     *add the concatenation base 1 base 2 base 3 to all_codons*
|     *return all_codons*

first implementation:
"""""""""""""""""""""
.. literalinclude:: _static/code/codons.py
   :linenos:
   :language: python

::

   python -i codons.py 
   >>> codons = all_codons()
   
:download:`codons.py <_static/code/codons.py>` .  

second implementation:
""""""""""""""""""""""

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
210
Mathematically speaking the generation of all codons can be the cartesian product
211
between 3 vectors 'acgt'. 
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
212
In python there is a function to do that in ``itertools module``: `https://docs.python.org/3/library/itertools.html#itertools.product <product>`_
213
214
215
216
217
218
219
220
221
222


.. literalinclude:: _static/code/codons_itertools.py
   :linenos:
   :language: python

::

   python -i codons.py 
   >>> codons = all_codons()
223
   
224
225
:download:`codons_itertools.py <_static/code/codons_itertools.py>` .

226
               
227
Exercise
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
228
229
230
231
232
233
234
235
236
237
--------

From a list return a new list without any duplicate, regardless of the order of items. 
For example: ::

   >>> l = [5,2,3,2,2,3,5,1]
   >>> uniqify(l)
   >>> [1,2,3,5] #is one of the solutions 


238
239
pseudocode:
"""""""""""
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
240

241
242
243
244
245
| *function uniqify(l)*
|     *uniq <- empty list*
|     *for each element of l*
|        *add element if is not in uniq*
|     *return uniq*
246

247
248
implementation:
"""""""""""""""
249

250
251
252
.. literalinclude:: _static/code/uniqify.py
   :linenos:
   :language: python
253

254
::
255

256
257
258
   >>> l=[1,2,3,2,3,4,5,1,2,3,3,2,7,8,9]
   >>> uniqify(l)
   [1, 2, 3, 4, 5, 7, 8, 9]
259

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
260
:download:`uniqify.py <_static/code/uniqify.py>` .
261

262
263
second implementation:
""""""""""""""""""""""
264

265
266
267
268
The problem with the first implementation come from the line 4.
Remember that the membership operator uses a linear search for list, which can be slow for very large collections.
If we plan to use ``uniqify`` with large list we should find a better algorithm.
In the specification we can read that uniqify can work *regardless the order of the resulting list*.
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
269
So we can use the specificity of set ::
270

271
272
 
   >>> list(set(l))
273

274

275
276
Exercise
--------
277

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
278
We need to compute the occurrence of all kmers of a given length present in a sequence.
279

280
Below we propose 2 algorithms. 
281

282
283
pseudo code 1
"""""""""""""
284

285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
|   *function get_kmer_occurences(seq, kmer_len)*
|      *all_kmers <- generate all possible kmer of kmer_len*
|      *occurences <- 0* 
|      *for each kmer in all_kmers*
|         *count occurence of kmer*
|         *store occurence*
     
pseudo code 2
"""""""""""""

|  *function get_kmer_occurences(seq, kmer_len)*
|     *all_kmers <- empty*
|     *from i = 0 to sequence length - kmer_len*
|        *kmer <- kmer startin at pos i im sequence*
|        *increase by of occurence of kmer*
 
301

302
.. note::
303

304
305
306
307
308
309
310
311
312
313
314
315
316
317
   Computer scientists typically measure an algorithm’s efficiency in terms of its worst-case running time, 
   which is the largest amount of time an algorithm can take given the most difficult input of a fixed size. 
   The advantage to considering the worst case running time is that we are guaranteed that our algorithm 
   will never behave worse than our worst-case estimate.
   
   Big-O notation compactly describes the running time of an algorithm. 
   For example, if your algorithm for sorting an array of n numbers takes roughly n2 operations for the most difficult dataset, 
   then we say that the running time of your algorithm is O(n2). In reality, depending on your implementation, it may be use any number of operations, 
   such as 1.5n2, n2 + n + 2, or 0.5n2 + 1; all these algorithms are O(n2) because big-O notation only cares about the term that grows the fastest with 
   respect to the size of the input. This is because as n grows very large, the difference in behavior between two O(n2) functions, 
   like 999 · n2 and n2 + 3n + 9999999, is negligible when compared to the behavior of functions from different classes, 
   say O(n2) and O(n6). Of course, we would prefer an algorithm requiring 1/2 · n2 steps to an algorithm requiring 1000 · n2 steps.

   When we write that the running time of an algorithm is O(n2), we technically mean that it does not grow faster than a function with a 
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
318
   leading term of c · n2, for some constant c. Formally, a function f(n) is Big-O of function g(n), or O(g(n)), when f(n) <= c · g(n) for some 
319
320
321
322
   constant c and sufficiently large n.

   For more on Big-O notation, see A `http://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/Beginner's <Guide to Big-O Notation>`_.
   
323

324
Compare the pseudocode of each of them and implement the fastest one. ::
325

326
327
328
329
330
331
332
333
334
   """gtcagaccttcctcctcagaagctcacagaaaaacacgctttctgaaagattccacactcaatgccaaaatataccacag
      gaaaattttgcaaggctcacggatttccagtgcaccactggctaaccaagtaggagcacctcttctactgccatgaaagg
      aaaccttcaaaccctaccactgagccattaactaccatcctgtttaagatctgaaaaacatgaagactgtattgctcctg
      atttgtcttctaggatctgctttcaccactccaaccgatccattgaactaccaatttggggcccatggacagaaaactgc
      agagaagcataaatatactcattctgaaatgccagaggaagagaacacagggtttgtaaacaaaggtgatgtgctgtctg
      gccacaggaccataaaagcagaggtaccggtactggatacacagaaggatgagccctgggcttccagaagacaaggacaa
      ggtgatggtgagcatcaaacaaaaaacagcctgaggagcattaacttccttactctgcacagtaatccagggttggcttc
      tgataaccaggaaagcaactctggcagcagcagggaacagcacagctctgagcaccaccagcccaggaggcacaggaaac
      acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca"""
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
335
336


Bertrand  NÉRON's avatar
Bertrand NÉRON committed
337
In the first algorithm.
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
338

339
| we first compute all kmers we generate 4\ :sup:`kmer length`
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
340
341
| then we count the occurrence of each kmer in the sequence
| so for each kmer we read all the sequence so the algorithm is in O( 4\ :sup:`kmer length` * ``sequence length``)
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
342

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
343
| In the second algorithm we read the sequence only once
344
| So the algorithm is in O(sequence length)
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
345
346


347
Compute the 6 mers occurences of the sequence above, and print each 6mer and it's occurence one per line.
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
348

349
350
351
.. literalinclude:: _static/code/kmer.py
   :linenos:
   :language: python
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
352

353
::
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
354

355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
   >>> s = """"gtcagaccttcctcctcagaagctcacagaaaaacacgctttctgaaagattccacactcaatgccaaaatataccacag
   ... gaaaattttgcaaggctcacggatttccagtgcaccactggctaaccaagtaggagcacctcttctactgccatgaaagg
   ... aaaccttcaaaccctaccactgagccattaactaccatcctgtttaagatctgaaaaacatgaagactgtattgctcctg
   ... atttgtcttctaggatctgctttcaccactccaaccgatccattgaactaccaatttggggcccatggacagaaaactgc
   ... agagaagcataaatatactcattctgaaatgccagaggaagagaacacagggtttgtaaacaaaggtgatgtgctgtctg
   ... gccacaggaccataaaagcagaggtaccggtactggatacacagaaggatgagccctgggcttccagaagacaaggacaa
   ... ggtgatggtgagcatcaaacaaaaaacagcctgaggagcattaacttccttactctgcacagtaatccagggttggcttc
   ... tgataaccaggaaagcaactctggcagcagcagggaacagcacagctctgagcaccaccagcccaggaggcacaggaaac
   ... acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca"""
   >>> s = s.replace('\n', '')
   >>> kmers = get_kmer_occurences(s, 6)
   >>> for kmer in kmers:
   >>>   print kmer[0], '..', kmer[1]
   gcagag .. 2
   aacttc .. 1
   gcaact .. 1
   aaatat .. 2
   
   
:download:`kmer.py <_static/code/kmer.py>` .
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
375
376
377


bonus:
378
""""""
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
379

380
Print the kmers by ordered by occurences.
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
381

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
382
383
| see `https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types <sort>`_
| see `https://docs.python.org/3/library/operator.html#operator.itemgetter <operator.itemgetter>`_
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
384
385


386
387
388
.. literalinclude:: _static/code/kmer_2.py
   :linenos:
   :language: python
389

390
::
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
391

392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
   >>> s = """"gtcagaccttcctcctcagaagctcacagaaaaacacgctttctgaaagattccacactcaatgccaaaatataccacag
   ... gaaaattttgcaaggctcacggatttccagtgcaccactggctaaccaagtaggagcacctcttctactgccatgaaagg
   ... aaaccttcaaaccctaccactgagccattaactaccatcctgtttaagatctgaaaaacatgaagactgtattgctcctg
   ... atttgtcttctaggatctgctttcaccactccaaccgatccattgaactaccaatttggggcccatggacagaaaactgc
   ... agagaagcataaatatactcattctgaaatgccagaggaagagaacacagggtttgtaaacaaaggtgatgtgctgtctg
   ... gccacaggaccataaaagcagaggtaccggtactggatacacagaaggatgagccctgggcttccagaagacaaggacaa
   ... ggtgatggtgagcatcaaacaaaaaacagcctgaggagcattaacttccttactctgcacagtaatccagggttggcttc
   ... tgataaccaggaaagcaactctggcagcagcagggaacagcacagctctgagcaccaccagcccaggaggcacaggaaac
   ... acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca"""
   >>> s = s.replace('\n', '')
   >>> kmers = get_kmer_occurences(s, 6)
   >>> for kmer, occ in kmers:
   >>>   print kmer, '..', occ
   cacagg .. 4
   aggaaa .. 4
   ttctga .. 3
   ccagtg .. 3
   
   
:download:`kmer_2.py <_static/code/kmer_2.py>` .
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
412
413


414
415
416
Exercise
--------

417
418
419
420
421
| Write a function which take a sequence as parameter and return it's reversed complement.
| Write the pseudocode before to propose an implementation.

pseudocode:
"""""""""""
422

423
424
425
426
427
428
| *function reverse_comp(sequence)*
|     *complement <- establish a correpondance and each base and its complement*
|     *rev_seq <- revert the sequence*
|     *rev_comp <- empty*
|     *for each nt of rev_seq*
|        *concatenate nt complement to rev_comp*
429
|     *return rev_comp*
430

431
.. literalinclude:: _static/code/rev_comp.py
432
433
   :linenos:
   :language: python
434

435
436
437
438
439
::
   >>> from rev_comp import rev_comp
   >>>
   >>> seq = 'acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca'
   >>> print rev_comp(seq)
440
   tgtgttgcgctcacgaaccagaccaagagcatccactggactttctcctctcagagcccactggccagccatgttgccgt
441
442
443
   
:download:`rev_comp.py <_static/code/rev_comp.py>` .

444
  
445
446
447
other solution
""""""""""""""

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
448
python provide an interesting method for our problem.
449
450
451
452
453
The ``translate`` method work on string and need a parameter which is a object
that can do the correspondance between characters in old string a the new one.
``maketrans`` is a function in module ``string`` that allow us to build this object.
``maketrans`` take 2 arguments, two strings, the first string contains the characters
to change, the second string the corresponding characters in the new string.
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
454
455
Thus the two strings **must** have the same length. The correspondance between
the characters to change and their new values is made in function of their position.
456
457
458
the first character of the first string will be replaced by the first character of the second string,
the second character of the first string will be replaced by the second character of the second string, on so on.   
So we can write the reverse complement without loop.
459
460
461
462
   
.. literalinclude:: _static/code/rev_comp2.py
   :linenos:
   :language: python
463
464

::
465
466
467
468
469
470
471
   >>> from rev_comp2 import rev_comp
   >>>
   >>> seq = 'acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca'
   >>> print rev_comp(seq)
   tgtgttgcgctcacgaaccagaccaagagcatccactggactttctcctctcagagcccactggccagccatgttgccgt
   
:download:`rev_comp2.py <_static/code/rev_comp2.py>` .
472

473
474
475
Exercise
--------

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
476
477
478
479
let the following enzymes collection:
We decide to implement enzymes as tuple with the following structure
("name", "comment", "sequence", "cut", "end")
::
480
 
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
481
482
483
484
485
486
487
488
489
490

   ecor1 = ("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
   ecor5 = ("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
   bamh1 = ("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
   hind3 = ("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
   taq1 = ("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
   not1 = ("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
   sau3a1 = ("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
   hae3 = ("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
   sma1 =  ("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506

and the 2 dna fragments: ::

   dna_1 = """tcgcgcaacgtcgcctacatctcaagattcagcgccgagatccccgggggttgagcgatccccgtcagttggcgtgaattcag
   cagcagcgcaccccgggcgtagaattccagttgcagataatagctgatttagttaacttggatcacagaagcttccaga
   ccaccgtatggatcccaacgcactgttacggatccaattcgtacgtttggggtgatttgattcccgctgcctgccagg"""

   dna_2 = """gagcatgagcggaattctgcatagcgcaagaatgcggccgcttagagcgatgctgccctaaactctatgcagcgggcgtgagg
   attcagtggcttcagaattcctcccgggagaagctgaatagtgaaacgattgaggtgttgtggtgaaccgagtaag
   agcagcttaaatcggagagaattccatttactggccagggtaagagttttggtaaatatatagtgatatctggcttg"""

| which enzymes cut the dna_1 ?
|                  the dna_2 ?
|                  the dna_1 but not the dna_2?


Bertrand  NÉRON's avatar
Bertrand NÉRON committed
507
508
509
510
511
512
513
514
515
* In a file <my_file.py>
    #. Write a function *seq_one_line* which take a multi lines sequence and return a sequence in one line.
    #. Write a function *enz_filter* which take a sequence and a list of enzymes and return a new list containing
       the enzymes which have a binding site in the sequence
    #. open a terminal with the command python -i <my_file.py>
    #. copy paste the enzymes and dna fragments
    #. use the functions above to compute the enzymes which cut the dna_1
       apply the same functions to compute the enzymes which cut the dna_2
       compute the difference between the enzymes which cut the dna_1 and enzymes which cut the dna_2
516
   
517
.. literalinclude:: _static/code/enzyme_1.py
518
519
520
521
522
523
524
525
526
527
528
529
530
   :linenos:
   :language: python

::
   from enzyme_1 import *
   
   enzymes = [ecor1, ecor5, bamh1, hind3, taq1, not1, sau3a1, hae3, sma1]
   dna_1 = one_line(dna_1)
   dans_2 = one_line(dna_2)
   enz_1 = enz_filter(enzymes, dna_1)
   enz_2 = enz_filter(enzymes, dna_2) 
   enz1_only = set(enz_1) - set(enz_2)

531
:download:`enzymes_1.py <_static/code/enzyme_1.py>` .
532
533
534
535
536
537
538
539

with this algorithm we find if an enzyme cut the dna but we cannot find all cuts in the dna for an enzyme. ::

   enzymes = [ecor1, ecor5, bamh1, hind3, taq1, not1, sau3a1, hae3, sma1]
   digest_1 = []
   for enz in enzymes:
      print enz.name, dna_1.count(enz.sequence)

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
540
the latter algorithm display the number of occurrence of each enzyme, But we cannot determine the position of every sites.
541
We will see how to do this later.
542

Bertrand  NÉRON's avatar
Bertrand NÉRON committed
543
Bonus
544
^^^^^
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
545
546
547
548
549
550
551
552
553

There is another kind of tuple which allow to access to itmes by index or name.
This data collection is called NamedTuple. The NamedTuple are not accessible directly they are in `collections` package,
so we have to import it before to use it.
We also have to define which name correspond to which item::

    import collections
    RestrictEnzyme = collections.namedtuple("RestrictEnzyme", ("name", "comment", "sequence", "cut", "end"))

Bertrand  NÉRON's avatar
typos    
Bertrand NÉRON committed
554
Then we can use this new kind of tuple::
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571

    ecor1 = RestrictEnzyme("EcoRI", "Ecoli restriction enzime I", "gaattc", 1, "sticky")
    ecor5 = RestrictEnzyme("EcoRV", "Ecoli restriction enzime V", "gatatc", 3, "blunt")
    bamh1 = RestrictEnzyme("BamHI", "type II restriction endonuclease from Bacillus amyloliquefaciens ", "ggatcc", 1, "sticky")
    hind3 = RestrictEnzyme("HindIII", "type II site-specific nuclease from Haemophilus influenzae", "aagctt", 1 , "sticky")
    taq1 = RestrictEnzyme("TaqI", "Thermus aquaticus", "tcga", 1 , "sticky")
    not1 = RestrictEnzyme("NotI", "Nocardia otitidis", "gcggccgc", 2 , "sticky")
    sau3a1 = RestrictEnzyme("Sau3aI", "Staphylococcus aureus", "gatc", 0 , "sticky")
    hae3 = RestrictEnzyme("HaeIII", "Haemophilus aegyptius", "ggcc", 2 , "blunt")
    sma1 =  RestrictEnzyme("SmaI", "Serratia marcescens", "cccggg", 3 , "blunt")

The code must be adapted as below

.. literalinclude:: _static/code/enzyme_1_namedtuple.py
   :linenos:
   :language: python

572
:download:`enzymes_1_namedtuple.py <_static/code/enzyme_1_namedtuple.py>` .
573

574
Exercise
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
--------

given the following dict : ::

   d = {1 : 'a', 2 : 'b', 3 : 'c' , 4 : 'd'}
   
We want obtain a new dict with the keys and the values inverted so we will obtain: ::

   inverted_d  {'a': 1, 'c': 3, 'b': 2, 'd': 4}

solution ::

   inverted_d = {}
   for key in d.keys():
       inverted_d[d[key]] = key
       
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
591
592
593
594
595
596
solution ::

   inverted_d = {}
   for key, value in d.items():
       inverted_d[value] = key
              
Bertrand  NÉRON's avatar
Bertrand NÉRON committed
597
598
solution ::

599
600
   inverted_d = {v : k for k, v in d.items()}