Data_Types.rst 10.1 KB
 Bertrand NÉRON committed Dec 11, 2014 1 2 3 ``````.. sectnum:: :start: 4 `````` Bertrand NÉRON committed Jul 14, 2014 4 5 6 7 8 ``````.. _Data_Types: ********** Data Types ********** `````` Bertrand NÉRON committed Aug 05, 2014 9 10 11 12 13 14 15 16 17 18 19 20 21 22 `````` Exercices ========= Exercise -------- Assume that we execute the following assignment statements: :: width = 17 height = 12.0 delimiter ='.' For each of the following expressions, write the value of the expression and the type (of the value of `````` Bertrand NÉRON committed Nov 21, 2014 23 ``````the expression) and explain. `````` Bertrand NÉRON committed Aug 05, 2014 24 `````` `````` Bertrand NÉRON committed Nov 21, 2014 25 26 27 28 `````` #. width / 2 #. width / 2.0 #. height / 3 #. 1 + 2 * 5 `````` Bertrand NÉRON committed Aug 05, 2014 29 30 31 32 33 34 35 36 37 38 39 40 41 42 `````` Use the Python interpreter to check your answers. :: >>> width = 17 >>> height = 12.0 >>> delimiter ='.' >>> >>> width / 2 8 >>> # both operands are integer so python done an euclidian division and threw out the remainder >>> width / 2.0 8.5 >>> height / 3 4.0 `````` Bertrand NÉRON committed Mar 11, 2019 43 44 `````` >>> # one of the operand is a float (2.0 or height) then python pyhton perform a float division but keep in mind that float numbers are approximation. >>> # if you need precision you need to use Decimal. But operations on Decimal are slow and float offer quite enough precision `````` Bertrand NÉRON committed Aug 05, 2014 45 46 `````` >>> # so we use decimal only if wee need great precision >>> # Euclidian division `````` Bertrand NÉRON committed Mar 11, 2019 47 `````` >>> 2 // 3 `````` Bertrand NÉRON committed Aug 05, 2014 48 49 `````` 0 >>> # float division `````` Bertrand NÉRON committed Mar 11, 2019 50 `````` >>> 2 / 3 `````` Bertrand NÉRON committed Aug 05, 2014 51 `````` 0.6666666666666666 `````` Bertrand NÉRON committed Mar 11, 2019 52 `````` `````` Bertrand NÉRON committed Aug 05, 2014 53 54 55 56 57 `````` Exercise -------- `````` Bertrand NÉRON committed Nov 21, 2014 58 ``````Write a function which take a radius as input and return the volume of a sphere: `````` Bertrand NÉRON committed Aug 05, 2014 59 `````` `````` Bertrand NÉRON committed Nov 21, 2014 60 61 62 63 64 65 66 67 ``````The volume of a sphere with radius r is 4/3 πr\ :sup:`3`. What is the volume of a sphere with radius 5? **Hint**: π is in math module, so to access it you need to import the math module Place the ``import`` statement at the top fo your file. after that, you can use ``math.pi`` everywhere in the file like this:: `````` Bertrand NÉRON committed Aug 05, 2014 68 `````` >>> import math `````` Bertrand NÉRON committed Nov 21, 2014 69 70 71 `````` >>> >>> #do what you need to do >>> math.pi #use math.pi `````` Bertrand NÉRON committed Mar 11, 2019 72 `````` `````` Bertrand NÉRON committed Nov 21, 2014 73 74 75 76 77 78 79 80 81 82 83 ``````.. literalinclude:: _static/code/vol_of_sphere.py :linenos: :language: python :: python -i volume_of_sphere.py >>> vol_of_sphere(5) 523.5987755982989 :download:`vol_of_sphere.py <_static/code/vol_of_sphere.py>` . `````` Bertrand NÉRON committed Aug 05, 2014 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 `````` Exercise -------- Draw what happen in memory when the following statements are executed: :: i = 12 i += 2 .. figure:: _static/figs/augmented_assignment_int.png :width: 400px :alt: set :figclass: align-center :: >>> i = 12 >>> id(i) 33157200 >>> i += 2 >>> id(i) 33157152 and :: s = 'gaa' s = s + 'ttc' .. figure:: _static/figs/augmented_assignment_string.png :width: 400px :alt: set :figclass: align-center :: >>> s = 'gaa' >>> id(s) 139950507582368 >>> s = s+ 'ttc' >>> s 'gaattc' >>> id(s) 139950571818896 when an augmented assignment operator is used on an immutable object is that #. the operation is performed, #. and an object holding the result is created #. and then the target object reference is re-bound to refer to the result object rather than the object it referred to before. So, in the preceding case when the statement ``i += 2`` is encountered, Python computes 1 + 2 , stores the result in a new int object, and then rebinds ``i`` to refer to this new int . And if the original object a was referring to has no more object references referring to it, it will be scheduled for garbage collection. The same mechanism is done with all immutable object included strings. Exercise -------- how to obtain a new sequence which is the 10 times repetition of the this motif : "AGGTCGACCAGATTANTCCG":: >>> s = "AGGTCGACCAGATTANTCCG" >>> s10 = s * 10 Exercise -------- create a representation in fasta format of following sequence : .. note:: A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the description (optional). There should be no space between the ">" and the first letter of the identifier. The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence. :: id = "sp|P60568|IL2_HUMAN" comment = "Interleukin-2 OS=Homo sapiens GN=IL2 PE=1 SV=1" sequence = """MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRML TFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSE TTFMCEYADETATIVEFLNRWITFCQSIISTLT""" >>> s = id + comment + '\n' + sequence or >>> s = "{id} {comment} \n{sequence}".format(id= id, comment = comment, sequence = sequence) Exercise -------- `````` Bertrand NÉRON committed Nov 25, 2014 178 179 ``````For the following exercise use the python file :download:`sv40 in fasta <_static/code/sv40_file.py>` which is a python file with the sequence of sv40 in fasta format already embeded, and use python -i sv40_file.py to work. `````` Bertrand NÉRON committed Aug 05, 2014 180 181 182 183 184 `````` how long is the sv40 in bp? Hint : the fasta header is 61bp long. (http://www.ncbi.nlm.nih.gov/nuccore/J02400.1) `````` Bertrand NÉRON committed Nov 21, 2014 185 186 187 188 189 190 191 ``````pseudocode write a function ``fasta_to_one_line`` that return a sequence as a string without header or any non sequence characters pseudocode: `````` Bertrand NÉRON committed Nov 23, 2014 192 ``````| *function fasta_to_one_line(seq)* `````` Bertrand NÉRON committed Nov 21, 2014 193 194 195 196 197 198 199 200 201 202 ``````| *header_end_at <- find the first return line character* | *raw_seq <- remove header from sequence* | *raw_seq <- remove non sequence chars* | *return raw_seq* .. literalinclude:: _static/code/fasta_to_one_line.py :linenos: :language: python `````` Bertrand NÉRON committed Aug 05, 2014 203 `````` `````` Bertrand NÉRON committed Nov 21, 2014 204 205 206 207 208 ``````:download:`fasta_to_one_line.py <_static/code/fasta_to_one_line.py>` . :: python `````` Bertrand NÉRON committed Nov 25, 2014 209 `````` >>> import sv40_file `````` Bertrand NÉRON committed Nov 21, 2014 210 211 `````` >>> import fasta_to_one_line >>> `````` Bertrand NÉRON committed Nov 25, 2014 212 213 `````` >>> sv40_seq = fasta_to_one_line(sv40_file.sv40_fasta) >>> print len(sv40_seq) `````` Bertrand NÉRON committed Aug 05, 2014 214 215 216 `````` 5243 Is that the following enzymes: `````` Bertrand NÉRON committed Nov 25, 2014 217 `````` `````` Bertrand NÉRON committed Aug 05, 2014 218 219 220 221 ``````* BamHI (ggatcc), * EcorI (gaattc), * HindIII (aagctt), * SmaI (cccggg) `````` Bertrand NÉRON committed Nov 21, 2014 222 223 `````` have recogition sites in sv40 (just answer by True or False)? :: `````` Bertrand NÉRON committed Aug 05, 2014 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 `````` >>> "ggatcc".upper() in sv40_sequence True >>> "gaattc".upper() in sv40_sequence True >>> "aagctt".upper() in sv40_sequence True >>> "cccggg".upper() in sv40_sequence False for the enzymes which have a recognition site can you give their positions? :: >>> sv40_sequence = sv40_sequence.lower() >>> sv40_sequence.find("ggatcc") 2532 >>> # remind the string are numbered from 0 >>> 2532 + 1 = 2533 >>> # the recognition motif of BamHI start at 2533 >>> sv40_sequence.find("gaattc") 1781 >>> # EcorI -> 1782 >>> sv40_sequence.find("aagctt") 1045 >>> # HindIII -> 1046 is there only one site in sv40 per enzyme? `````` Bertrand NÉRON committed Mar 11, 2019 251 252 253 254 ``````The ``find`` method give the index of the first occurrence or -1 if the substring is not found. So we can not determine the occurrences of a site only with the find method. We can know how many sites are present with the ``count`` method. We will see how to determine the site of all occurrences when we learn looping and conditions. `````` Bertrand NÉRON committed Aug 05, 2014 255 256 257 258 259 260 261 `````` Exercise -------- We want to perform a PCR on sv40, can you give the length and the sequence of the amplicon? `````` Bertrand NÉRON committed Nov 21, 2014 262 263 264 ``````Write a function which have 3 parameters ``sequence``, ``primer_1`` and ``primer_2`` * *We consider only the cases where primer_1 and primer_2 are present in sequence* `````` Bertrand NÉRON committed Mar 09, 2017 265 ``````* *to simplify the exercise, the 2 primers can be read directly on the sv40 sequence.* `````` Bertrand NÉRON committed Nov 21, 2014 266 267 268 269 `````` test you algorithm with the following primers | primer_1 : 5' CGGGACTATGGTTGCTGACT 3' `````` Bertrand NÉRON committed Mar 09, 2017 270 ``````| primer_2 : 5' TCTTTCCGCCTCAGAAGGTA 3' `````` Bertrand NÉRON committed Nov 21, 2014 271 272 273 274 275 276 277 278 279 280 281 282 283 `````` Write the pseudocode before to implement it. | *function amplicon_len(sequence primer_1, primer_2)* | *pos_1 <- find position of primer_1 in sequence* | *pos_2 <- find position of primer_2 in sequence* | *amplicon length <- pos_2 + length(primer_2) - pos_1* | *return amplicon length* .. literalinclude:: _static/code/amplicon_len.py :linenos: :language: python `````` Bertrand NÉRON committed Aug 05, 2014 284 285 286 `````` :: `````` Bertrand NÉRON committed Nov 21, 2014 287 288 289 290 291 `````` >>> import sv40 >>> import fasta_to_one_line >>> >>> sequence = fasta_to_one_line(sv40) >>> print amplicon_len(sequence, first_primer, second_primer ) `````` Bertrand NÉRON committed Aug 05, 2014 292 `````` 199 `````` Bertrand NÉRON committed Nov 21, 2014 293 294 295 `````` :download:`amplicon_len.py <_static/code/amplicon_len.py>` . `````` Bertrand NÉRON committed Aug 05, 2014 296 297 298 299 300 301 302 303 304 305 306 `````` Exercise -------- reverse the following sequence "TACCTTCTGAGGCGGAAAGA" (don't compute the complement): :: >>> "TACCTTCTGAGGCGGAAAGA"[::-1] or >>> s = "TACCTTCTGAGGCGGAAAGA" >>> l = list(s) # take care reverse() reverse a list in place (the method do a side effect and return None ) `````` Bertrand NÉRON committed Mar 09, 2017 307 `````` # so if you don't have a object reference on the list you cannot get the reversed list! `````` Bertrand NÉRON committed Aug 05, 2014 308 309 310 311 312 313 314 315 316 317 318 319 `````` >>> l.reverse() >>> print l >>> ''.join(l) or >>> rev_s = reversed(s) ''.join(rev_s) The most efficient way to reverse a string or a list is the way using the slice. Exercise -------- `````` Bertrand NÉRON committed Nov 21, 2014 320 321 322 ``````| The il2_human contains 4 cysteins (C) in positions 9, 78, 125, 145. | We want to generate the sequence of a mutatnt were the cysteins 78 and 125 are replaced by serins (S) | Write the pseudocode, before to propose an implementation: `````` Bertrand NÉRON committed Aug 05, 2014 323 `````` `````` Bertrand NÉRON committed Nov 21, 2014 324 ``````We have to take care of the string numbered vs sequence numbered: `````` Bertrand NÉRON committed Aug 05, 2014 325 326 327 328 329 330 331 `````` | C in seq -> in string | 9 -> 8 | 78 -> 77 | 125 -> 124 | 145 -> 144 `````` Bertrand NÉRON committed Nov 21, 2014 332 333 334 335 336 337 338 339 340 341 ``````| *generate 3 slices from the il2_human* | *head <- from the begining and cut between the first cytein and the second* | *body <- include the 2nd and 3rd cystein* | *tail <- cut after the 3rd cystein until the end* | *replace body cystein by serin* | *make new sequence with head body_mutate tail* il2_human = 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFCQSIISTLT' `````` Bertrand NÉRON committed Aug 05, 2014 342 343 344 345 346 347 348 349 350 351 352 353 `````` :: head = il2_human[:77] body = il2_human[77:125] tail = il2_human[126:] body_mutate = body.replace('C', 'S') il2_mutate = head + body_mutate + tail Exercise -------- `````` Bertrand NÉRON committed Nov 21, 2014 354 355 ``````Write a function `````` Bertrand NÉRON committed Mar 09, 2017 356 ``````* which take a sequence as parameter `````` Bertrand NÉRON committed Nov 21, 2014 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 ``````* compute the GC% * and return it * display the results readable for human as a micro report like this: 'the sv40 is 5243 bp length and have 40.80% gc' use sv40 sequence to test your function. .. literalinclude:: _static/code/gc_percent.py :linenos: :language: python :: >>> import sv40 >>> import fasta_to_one_line >>> import gc_percent >>> >>> sequence = fasta_to_one_line(sv40) >>> gc_pc = gc_percent(sequence) `````` Bertrand NÉRON committed Mar 11, 2019 376 `````` >>> report = "the sv40 is {0} bp length and have {1:.2%} gc".format(len(sequence), gc_pc) `````` Bertrand NÉRON committed Nov 21, 2014 377 378 379 380 `````` >>> print report 'the sv40 is 5243 bp length and have 40.80% gc' :download:`gc_percent.py <_static/code/gc_percent.py>` . ``````