Data_Types.rst 10.4 KB
 Bertrand NÉRON committed Dec 11, 2014 1 2 3 ``````.. sectnum:: :start: 4 `````` Bertrand NÉRON committed Jul 14, 2014 4 5 6 7 8 ``````.. _Data_Types: ********** Data Types ********** `````` Bertrand NÉRON committed Aug 05, 2014 9 10 11 12 13 14 15 16 17 18 19 20 21 22 `````` Exercices ========= Exercise -------- Assume that we execute the following assignment statements: :: width = 17 height = 12.0 delimiter ='.' For each of the following expressions, write the value of the expression and the type (of the value of `````` Bertrand NÉRON committed Nov 21, 2014 23 ``````the expression) and explain. `````` Bertrand NÉRON committed Aug 05, 2014 24 `````` `````` Bertrand NÉRON committed Nov 21, 2014 25 26 27 28 `````` #. width / 2 #. width / 2.0 #. height / 3 #. 1 + 2 * 5 `````` Bertrand NÉRON committed Aug 05, 2014 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 `````` Use the Python interpreter to check your answers. :: >>> width = 17 >>> height = 12.0 >>> delimiter ='.' >>> >>> width / 2 8 >>> # both operands are integer so python done an euclidian division and threw out the remainder >>> width / 2.0 8.5 >>> height / 3 4.0 >>> # one of the operand is a float (2.0 or height) then python pyhton perform afloat division but keep in mind that float numbers are aproximation. >>> # if you need precision you need to use Decimal. But operations on Decimal are slow and float offer quite enought precision >>> # so we use decimal only if wee need great precision >>> # Euclidian division >>> 2 / 3 0 >>> # float division >>> float(2)/float(3) 0.6666666666666666 >>> # decimal division >>> from decimal import Decimal >>> a = Decimal(2) >>> b = Decimal(3) >>> a / b Decimal('0.6666666666666666666666666667') >>> 1 + 2 * 5 11 Exercise -------- `````` Bertrand NÉRON committed Nov 21, 2014 65 ``````Write a function which take a radius as input and return the volume of a sphere: `````` Bertrand NÉRON committed Aug 05, 2014 66 `````` `````` Bertrand NÉRON committed Nov 21, 2014 67 68 69 70 71 72 73 74 ``````The volume of a sphere with radius r is 4/3 πr\ :sup:`3`. What is the volume of a sphere with radius 5? **Hint**: π is in math module, so to access it you need to import the math module Place the ``import`` statement at the top fo your file. after that, you can use ``math.pi`` everywhere in the file like this:: `````` Bertrand NÉRON committed Aug 05, 2014 75 `````` >>> import math `````` Bertrand NÉRON committed Nov 21, 2014 76 77 78 `````` >>> >>> #do what you need to do >>> math.pi #use math.pi `````` Bertrand NÉRON committed Aug 05, 2014 79 `````` `````` Bertrand NÉRON committed Nov 21, 2014 80 ``````**Hint**: the volume of a spher with radius 5 is **not** 392.7 ! `````` Bertrand NÉRON committed Aug 05, 2014 81 `````` `````` Bertrand NÉRON committed Nov 21, 2014 82 83 84 85 86 87 88 89 90 91 92 ``````.. literalinclude:: _static/code/vol_of_sphere.py :linenos: :language: python :: python -i volume_of_sphere.py >>> vol_of_sphere(5) 523.5987755982989 :download:`vol_of_sphere.py <_static/code/vol_of_sphere.py>` . `````` Bertrand NÉRON committed Aug 05, 2014 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 `````` Exercise -------- Draw what happen in memory when the following statements are executed: :: i = 12 i += 2 .. figure:: _static/figs/augmented_assignment_int.png :width: 400px :alt: set :figclass: align-center :: >>> i = 12 >>> id(i) 33157200 >>> i += 2 >>> id(i) 33157152 and :: s = 'gaa' s = s + 'ttc' .. figure:: _static/figs/augmented_assignment_string.png :width: 400px :alt: set :figclass: align-center :: >>> s = 'gaa' >>> id(s) 139950507582368 >>> s = s+ 'ttc' >>> s 'gaattc' >>> id(s) 139950571818896 when an augmented assignment operator is used on an immutable object is that #. the operation is performed, #. and an object holding the result is created #. and then the target object reference is re-bound to refer to the result object rather than the object it referred to before. So, in the preceding case when the statement ``i += 2`` is encountered, Python computes 1 + 2 , stores the result in a new int object, and then rebinds ``i`` to refer to this new int . And if the original object a was referring to has no more object references referring to it, it will be scheduled for garbage collection. The same mechanism is done with all immutable object included strings. Exercise -------- how to obtain a new sequence which is the 10 times repetition of the this motif : "AGGTCGACCAGATTANTCCG":: >>> s = "AGGTCGACCAGATTANTCCG" >>> s10 = s * 10 Exercise -------- create a representation in fasta format of following sequence : .. note:: A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the description (optional). There should be no space between the ">" and the first letter of the identifier. The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence. :: id = "sp|P60568|IL2_HUMAN" comment = "Interleukin-2 OS=Homo sapiens GN=IL2 PE=1 SV=1" sequence = """MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRML TFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSE TTFMCEYADETATIVEFLNRWITFCQSIISTLT""" >>> s = id + comment + '\n' + sequence or >>> s = "{id} {comment} \n{sequence}".format(id= id, comment = comment, sequence = sequence) Exercise -------- `````` Bertrand NÉRON committed Nov 25, 2014 187 188 ``````For the following exercise use the python file :download:`sv40 in fasta <_static/code/sv40_file.py>` which is a python file with the sequence of sv40 in fasta format already embeded, and use python -i sv40_file.py to work. `````` Bertrand NÉRON committed Aug 05, 2014 189 190 191 192 193 `````` how long is the sv40 in bp? Hint : the fasta header is 61bp long. (http://www.ncbi.nlm.nih.gov/nuccore/J02400.1) `````` Bertrand NÉRON committed Nov 21, 2014 194 195 196 197 198 199 200 ``````pseudocode write a function ``fasta_to_one_line`` that return a sequence as a string without header or any non sequence characters pseudocode: `````` Bertrand NÉRON committed Nov 23, 2014 201 ``````| *function fasta_to_one_line(seq)* `````` Bertrand NÉRON committed Nov 21, 2014 202 203 204 205 206 207 208 209 210 211 ``````| *header_end_at <- find the first return line character* | *raw_seq <- remove header from sequence* | *raw_seq <- remove non sequence chars* | *return raw_seq* .. literalinclude:: _static/code/fasta_to_one_line.py :linenos: :language: python `````` Bertrand NÉRON committed Aug 05, 2014 212 `````` `````` Bertrand NÉRON committed Nov 21, 2014 213 214 215 216 217 ``````:download:`fasta_to_one_line.py <_static/code/fasta_to_one_line.py>` . :: python `````` Bertrand NÉRON committed Nov 25, 2014 218 `````` >>> import sv40_file `````` Bertrand NÉRON committed Nov 21, 2014 219 220 `````` >>> import fasta_to_one_line >>> `````` Bertrand NÉRON committed Nov 25, 2014 221 222 `````` >>> sv40_seq = fasta_to_one_line(sv40_file.sv40_fasta) >>> print len(sv40_seq) `````` Bertrand NÉRON committed Aug 05, 2014 223 224 225 `````` 5243 Is that the following enzymes: `````` Bertrand NÉRON committed Nov 25, 2014 226 `````` `````` Bertrand NÉRON committed Aug 05, 2014 227 228 229 230 ``````* BamHI (ggatcc), * EcorI (gaattc), * HindIII (aagctt), * SmaI (cccggg) `````` Bertrand NÉRON committed Nov 21, 2014 231 232 `````` have recogition sites in sv40 (just answer by True or False)? :: `````` Bertrand NÉRON committed Aug 05, 2014 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 `````` >>> "ggatcc".upper() in sv40_sequence True >>> "gaattc".upper() in sv40_sequence True >>> "aagctt".upper() in sv40_sequence True >>> "cccggg".upper() in sv40_sequence False for the enzymes which have a recognition site can you give their positions? :: >>> sv40_sequence = sv40_sequence.lower() >>> sv40_sequence.find("ggatcc") 2532 >>> # remind the string are numbered from 0 >>> 2532 + 1 = 2533 >>> # the recognition motif of BamHI start at 2533 >>> sv40_sequence.find("gaattc") 1781 >>> # EcorI -> 1782 >>> sv40_sequence.find("aagctt") 1045 >>> # HindIII -> 1046 is there only one site in sv40 per enzyme? The ``find`` method give the index of the first occurence or -1 if the substring is not found. So we can not determine the occurence of a site only with the find method. We will see how to do that when we learn looping and conditions. Exercise -------- We want to perform a PCR on sv40, can you give the length and the sequence of the amplicon? `````` Bertrand NÉRON committed Nov 21, 2014 270 271 272 ``````Write a function which have 3 parameters ``sequence``, ``primer_1`` and ``primer_2`` * *We consider only the cases where primer_1 and primer_2 are present in sequence* `````` Bertrand NÉRON committed Mar 09, 2017 273 ``````* *to simplify the exercise, the 2 primers can be read directly on the sv40 sequence.* `````` Bertrand NÉRON committed Nov 21, 2014 274 275 276 277 `````` test you algorithm with the following primers | primer_1 : 5' CGGGACTATGGTTGCTGACT 3' `````` Bertrand NÉRON committed Mar 09, 2017 278 ``````| primer_2 : 5' TCTTTCCGCCTCAGAAGGTA 3' `````` Bertrand NÉRON committed Nov 21, 2014 279 280 281 282 283 284 285 286 287 288 289 290 291 `````` Write the pseudocode before to implement it. | *function amplicon_len(sequence primer_1, primer_2)* | *pos_1 <- find position of primer_1 in sequence* | *pos_2 <- find position of primer_2 in sequence* | *amplicon length <- pos_2 + length(primer_2) - pos_1* | *return amplicon length* .. literalinclude:: _static/code/amplicon_len.py :linenos: :language: python `````` Bertrand NÉRON committed Aug 05, 2014 292 293 294 `````` :: `````` Bertrand NÉRON committed Nov 21, 2014 295 296 297 298 299 `````` >>> import sv40 >>> import fasta_to_one_line >>> >>> sequence = fasta_to_one_line(sv40) >>> print amplicon_len(sequence, first_primer, second_primer ) `````` Bertrand NÉRON committed Aug 05, 2014 300 `````` 199 `````` Bertrand NÉRON committed Nov 21, 2014 301 302 303 `````` :download:`amplicon_len.py <_static/code/amplicon_len.py>` . `````` Bertrand NÉRON committed Aug 05, 2014 304 305 306 307 308 309 310 311 312 313 314 `````` Exercise -------- reverse the following sequence "TACCTTCTGAGGCGGAAAGA" (don't compute the complement): :: >>> "TACCTTCTGAGGCGGAAAGA"[::-1] or >>> s = "TACCTTCTGAGGCGGAAAGA" >>> l = list(s) # take care reverse() reverse a list in place (the method do a side effect and return None ) `````` Bertrand NÉRON committed Mar 09, 2017 315 `````` # so if you don't have a object reference on the list you cannot get the reversed list! `````` Bertrand NÉRON committed Aug 05, 2014 316 317 318 319 320 321 322 323 324 325 326 327 `````` >>> l.reverse() >>> print l >>> ''.join(l) or >>> rev_s = reversed(s) ''.join(rev_s) The most efficient way to reverse a string or a list is the way using the slice. Exercise -------- `````` Bertrand NÉRON committed Nov 21, 2014 328 329 330 ``````| The il2_human contains 4 cysteins (C) in positions 9, 78, 125, 145. | We want to generate the sequence of a mutatnt were the cysteins 78 and 125 are replaced by serins (S) | Write the pseudocode, before to propose an implementation: `````` Bertrand NÉRON committed Aug 05, 2014 331 `````` `````` Bertrand NÉRON committed Nov 21, 2014 332 ``````We have to take care of the string numbered vs sequence numbered: `````` Bertrand NÉRON committed Aug 05, 2014 333 334 335 336 337 338 339 `````` | C in seq -> in string | 9 -> 8 | 78 -> 77 | 125 -> 124 | 145 -> 144 `````` Bertrand NÉRON committed Nov 21, 2014 340 341 342 343 344 345 346 347 348 349 ``````| *generate 3 slices from the il2_human* | *head <- from the begining and cut between the first cytein and the second* | *body <- include the 2nd and 3rd cystein* | *tail <- cut after the 3rd cystein until the end* | *replace body cystein by serin* | *make new sequence with head body_mutate tail* il2_human = 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFCQSIISTLT' `````` Bertrand NÉRON committed Aug 05, 2014 350 351 352 353 354 355 356 357 358 359 360 361 `````` :: head = il2_human[:77] body = il2_human[77:125] tail = il2_human[126:] body_mutate = body.replace('C', 'S') il2_mutate = head + body_mutate + tail Exercise -------- `````` Bertrand NÉRON committed Nov 21, 2014 362 363 ``````Write a function `````` Bertrand NÉRON committed Mar 09, 2017 364 ``````* which take a sequence as parameter `````` Bertrand NÉRON committed Nov 21, 2014 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 ``````* compute the GC% * and return it * display the results readable for human as a micro report like this: 'the sv40 is 5243 bp length and have 40.80% gc' use sv40 sequence to test your function. .. literalinclude:: _static/code/gc_percent.py :linenos: :language: python :: >>> import sv40 >>> import fasta_to_one_line >>> import gc_percent >>> >>> sequence = fasta_to_one_line(sv40) >>> gc_pc = gc_percent(sequence) >>> report = the sv40 is {0} bp length and have {1:.2%} gc".format(len(sequence), gc_pc) >>> print report 'the sv40 is 5243 bp length and have 40.80% gc' :download:`gc_percent.py <_static/code/gc_percent.py>` . `````` Bertrand NÉRON committed Aug 05, 2014 390 391 392 393 394 `````` :: gc_pc = float(sv40_sequence.count('g') + sv40_sequence.count('c')) / float(len(sv40_sequence)) "the sv40 is {0} bp lenght and have {1:.2%} gc".format(len(sv40), gc_pc)``````