Data_Types.rst 10.5 KB
 Bertrand NÉRON committed Dec 11, 2014 1 2 3 ``````.. sectnum:: :start: 4 `````` Bertrand NÉRON committed Jul 14, 2014 4 5 6 7 8 ``````.. _Data_Types: ********** Data Types ********** `````` Bertrand NÉRON committed Aug 05, 2014 9 10 11 12 13 14 15 16 17 18 19 20 21 22 `````` Exercices ========= Exercise -------- Assume that we execute the following assignment statements: :: width = 17 height = 12.0 delimiter ='.' For each of the following expressions, write the value of the expression and the type (of the value of `````` Bertrand NÉRON committed Nov 21, 2014 23 ``````the expression) and explain. `````` Bertrand NÉRON committed Aug 05, 2014 24 `````` `````` Bertrand NÉRON committed Nov 21, 2014 25 26 27 28 `````` #. width / 2 #. width / 2.0 #. height / 3 #. 1 + 2 * 5 `````` Bertrand NÉRON committed Aug 05, 2014 29 30 31 32 33 34 35 36 37 38 39 40 41 42 `````` Use the Python interpreter to check your answers. :: >>> width = 17 >>> height = 12.0 >>> delimiter ='.' >>> >>> width / 2 8 >>> # both operands are integer so python done an euclidian division and threw out the remainder >>> width / 2.0 8.5 >>> height / 3 4.0 `````` Bertrand NÉRON committed Mar 11, 2019 43 44 `````` >>> # one of the operand is a float (2.0 or height) then python pyhton perform a float division but keep in mind that float numbers are approximation. >>> # if you need precision you need to use Decimal. But operations on Decimal are slow and float offer quite enough precision `````` Bertrand NÉRON committed Aug 05, 2014 45 46 `````` >>> # so we use decimal only if wee need great precision >>> # Euclidian division `````` Bertrand NÉRON committed Mar 11, 2019 47 `````` >>> 2 // 3 `````` Bertrand NÉRON committed Aug 05, 2014 48 49 `````` 0 >>> # float division `````` Bertrand NÉRON committed Mar 11, 2019 50 `````` >>> 2 / 3 `````` Bertrand NÉRON committed Aug 05, 2014 51 `````` 0.6666666666666666 `````` Bertrand NÉRON committed Mar 11, 2019 52 `````` `````` Bertrand NÉRON committed Aug 05, 2014 53 54 55 56 57 `````` Exercise -------- `````` Bertrand NÉRON committed Nov 21, 2014 58 ``````Write a function which take a radius as input and return the volume of a sphere: `````` Bertrand NÉRON committed Aug 05, 2014 59 `````` `````` Bertrand NÉRON committed Nov 21, 2014 60 61 62 63 64 65 66 67 ``````The volume of a sphere with radius r is 4/3 πr\ :sup:`3`. What is the volume of a sphere with radius 5? **Hint**: π is in math module, so to access it you need to import the math module Place the ``import`` statement at the top fo your file. after that, you can use ``math.pi`` everywhere in the file like this:: `````` Bertrand NÉRON committed Aug 05, 2014 68 `````` >>> import math `````` Bertrand NÉRON committed Nov 21, 2014 69 70 71 `````` >>> >>> #do what you need to do >>> math.pi #use math.pi `````` Bertrand NÉRON committed Mar 11, 2019 72 `````` `````` Bertrand NÉRON committed Nov 21, 2014 73 74 75 76 77 78 79 80 81 82 83 ``````.. literalinclude:: _static/code/vol_of_sphere.py :linenos: :language: python :: python -i volume_of_sphere.py >>> vol_of_sphere(5) 523.5987755982989 :download:`vol_of_sphere.py <_static/code/vol_of_sphere.py>` . `````` Bertrand NÉRON committed Aug 05, 2014 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 `````` Exercise -------- Draw what happen in memory when the following statements are executed: :: i = 12 i += 2 .. figure:: _static/figs/augmented_assignment_int.png :width: 400px :alt: set :figclass: align-center :: >>> i = 12 >>> id(i) 33157200 >>> i += 2 >>> id(i) 33157152 and :: s = 'gaa' s = s + 'ttc' .. figure:: _static/figs/augmented_assignment_string.png :width: 400px :alt: set :figclass: align-center :: >>> s = 'gaa' >>> id(s) 139950507582368 >>> s = s+ 'ttc' >>> s 'gaattc' >>> id(s) 139950571818896 when an augmented assignment operator is used on an immutable object is that #. the operation is performed, #. and an object holding the result is created #. and then the target object reference is re-bound to refer to the result object rather than the object it referred to before. So, in the preceding case when the statement ``i += 2`` is encountered, Python computes 1 + 2 , stores the result in a new int object, and then rebinds ``i`` to refer to this new int . And if the original object a was referring to has no more object references referring to it, it will be scheduled for garbage collection. The same mechanism is done with all immutable object included strings. Exercise -------- how to obtain a new sequence which is the 10 times repetition of the this motif : "AGGTCGACCAGATTANTCCG":: >>> s = "AGGTCGACCAGATTANTCCG" >>> s10 = s * 10 Exercise -------- create a representation in fasta format of following sequence : .. note:: A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the description (optional). There should be no space between the ">" and the first letter of the identifier. The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence. :: `````` Bertrand NÉRON committed Jun 07, 2021 162 `````` name = "sp|P60568|IL2_HUMAN" `````` Bertrand NÉRON committed Aug 05, 2014 163 164 165 166 167 168 169 `````` comment = "Interleukin-2 OS=Homo sapiens GN=IL2 PE=1 SV=1" sequence = """MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRML TFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSE TTFMCEYADETATIVEFLNRWITFCQSIISTLT""" `````` Blaise Li committed Jun 07, 2021 170 `````` >>> s = ">" + name + " " + comment + '\n' + sequence `````` Bertrand NÉRON committed Aug 05, 2014 171 `````` or `````` Blaise Li committed Jun 07, 2021 172 `````` >>> s = ">{name} {comment}\n{sequence}".format(id=id, comment=comment, sequence=sequence) `````` Bertrand NÉRON committed Jun 07, 2021 173 `````` or `````` Bertrand NÉRON committed Jun 08, 2021 174 `````` `````` Blaise Li committed Jun 07, 2021 175 176 177 `````` >>> s = f">{name} {comment}\n{sequence}" `````` Bertrand NÉRON committed Aug 05, 2014 178 179 180 ``````Exercise -------- `````` Bertrand NÉRON committed Nov 25, 2014 181 182 ``````For the following exercise use the python file :download:`sv40 in fasta <_static/code/sv40_file.py>` which is a python file with the sequence of sv40 in fasta format already embeded, and use python -i sv40_file.py to work. `````` Bertrand NÉRON committed Aug 05, 2014 183 `````` `````` Blaise Li committed Jun 08, 2021 184 ``````How long is the sv40 in bp? `````` Bertrand NÉRON committed Aug 05, 2014 185 186 187 ``````Hint : the fasta header is 61bp long. (http://www.ncbi.nlm.nih.gov/nuccore/J02400.1) `````` Bertrand NÉRON committed Nov 21, 2014 188 189 190 191 192 193 194 ``````pseudocode write a function ``fasta_to_one_line`` that return a sequence as a string without header or any non sequence characters pseudocode: `````` Bertrand NÉRON committed Nov 23, 2014 195 ``````| *function fasta_to_one_line(seq)* `````` Bertrand NÉRON committed Nov 21, 2014 196 197 198 199 200 201 202 203 204 205 ``````| *header_end_at <- find the first return line character* | *raw_seq <- remove header from sequence* | *raw_seq <- remove non sequence chars* | *return raw_seq* .. literalinclude:: _static/code/fasta_to_one_line.py :linenos: :language: python `````` Blaise Li committed Jun 08, 2021 206 207 `````` :download:`fasta_to_one_line.py <_static/code/fasta_to_one_line.py>`. `````` Bertrand NÉRON committed Nov 21, 2014 208 209 210 211 `````` :: python `````` Bertrand NÉRON committed Nov 25, 2014 212 `````` >>> import sv40_file `````` Bertrand NÉRON committed Nov 21, 2014 213 214 `````` >>> import fasta_to_one_line >>> `````` Blaise Li committed Jun 08, 2021 215 `````` >>> sv40_seq = fasta_to_one_line(sv40_file.sv40_fasta) `````` Bertrand NÉRON committed Nov 25, 2014 216 `````` >>> print len(sv40_seq) `````` Bertrand NÉRON committed Aug 05, 2014 217 218 `````` 5243 `````` Blaise Li committed Jun 08, 2021 219 ``````Consider the following restriction enzymes: `````` Bertrand NÉRON committed Nov 25, 2014 220 `````` `````` Blaise Li committed Jun 08, 2021 221 222 223 224 ``````* BamHI (ggatcc) * EcorI (gaattc) * HindIII (aagctt) * SmaI (cccggg) `````` Bertrand NÉRON committed Nov 21, 2014 225 `````` `````` Blaise Li committed Jun 08, 2021 226 227 228 ``````For each of them, tell whether it has recogition sites in sv40 (just answer by True or False). :: `````` Bertrand NÉRON committed Aug 05, 2014 229 230 231 232 233 234 235 236 237 238 `````` >>> "ggatcc".upper() in sv40_sequence True >>> "gaattc".upper() in sv40_sequence True >>> "aagctt".upper() in sv40_sequence True >>> "cccggg".upper() in sv40_sequence False `````` Blaise Li committed Jun 08, 2021 239 240 241 ``````For the enzymes which have a recognition site can you give their positions? :: `````` Bertrand NÉRON committed Aug 05, 2014 242 243 244 245 246 `````` >>> sv40_sequence = sv40_sequence.lower() >>> sv40_sequence.find("ggatcc") 2532 >>> # remind the string are numbered from 0 `````` Blaise Li committed Jun 08, 2021 247 248 `````` >>> 2532 + 1 2533 `````` Bertrand NÉRON committed Aug 05, 2014 249 250 251 252 253 254 255 256 `````` >>> # the recognition motif of BamHI start at 2533 >>> sv40_sequence.find("gaattc") 1781 >>> # EcorI -> 1782 >>> sv40_sequence.find("aagctt") 1045 >>> # HindIII -> 1046 `````` Blaise Li committed Jun 08, 2021 257 258 259 260 261 ``````Is there only one site in sv40 per enzyme? The ``find`` method gives the index of the first occurrence or -1 if the substring is not found. So we can not determine the number of occurrences of a site only with the ``find`` method. `````` Bertrand NÉRON committed Mar 11, 2019 262 ``````We can know how many sites are present with the ``count`` method. `````` Blaise Li committed Jun 08, 2021 263 264 265 266 267 268 269 270 271 272 273 274 275 `````` :: >>> sv40_seq.count("ggatcc") 1 >>> sv40_seq.count("gaattc") 1 >>> sv40_seq.count("aagctt") 6 >>> sv40_seq.count("cccggg") 0 We will see how to determine all occurrences of restriction sites when we learn looping and conditions. `````` Bertrand NÉRON committed Aug 05, 2014 276 277 278 279 280 `````` Exercise -------- `````` Blaise Li committed Jun 08, 2021 281 ``````We want to perform a PCR on sv40. Can you give the length and the sequence of the amplicon? `````` Bertrand NÉRON committed Aug 05, 2014 282 `````` `````` Blaise Li committed Jun 08, 2021 283 ``````Write a function which has 3 parameters ``sequence``, ``primer_1`` and ``primer_2`` and returns the amplicon length. `````` Bertrand NÉRON committed Nov 21, 2014 284 `````` `````` Blaise Li committed Jun 08, 2021 285 286 ``````* *We consider only the cases where primer_1 and primer_2 are present in the sequence.* * *To simplify the exercise, the 2 primers can be read directly in the sv40 sequence (i.e. no need to reverse-complement).* `````` Bertrand NÉRON committed Nov 21, 2014 287 `````` `````` Blaise Li committed Jun 08, 2021 288 ``````Test you algorithm with the following primers: `````` Bertrand NÉRON committed Nov 21, 2014 289 290 `````` | primer_1 : 5' CGGGACTATGGTTGCTGACT 3' `````` Bertrand NÉRON committed Mar 09, 2017 291 ``````| primer_2 : 5' TCTTTCCGCCTCAGAAGGTA 3' `````` Bertrand NÉRON committed Nov 21, 2014 292 `````` `````` Blaise Li committed Jun 08, 2021 293 294 ``````Write the function in pseudocode before implementing it. `````` Bertrand NÉRON committed Nov 21, 2014 295 296 297 298 299 `````` | *function amplicon_len(sequence primer_1, primer_2)* | *pos_1 <- find position of primer_1 in sequence* | *pos_2 <- find position of primer_2 in sequence* | *amplicon length <- pos_2 + length(primer_2) - pos_1* `````` Blaise Li committed Jun 08, 2021 300 ``````| *return amplicon length* `````` Bertrand NÉRON committed Nov 21, 2014 301 302 303 304 305 `````` .. literalinclude:: _static/code/amplicon_len.py :linenos: :language: python `````` Bertrand NÉRON committed Aug 05, 2014 306 307 308 `````` :: `````` Blaise Li committed Jun 08, 2021 309 `````` >>> import sv40 `````` Bertrand NÉRON committed Nov 21, 2014 310 `````` >>> import fasta_to_one_line `````` Blaise Li committed Jun 08, 2021 311 `````` >>> `````` Bertrand NÉRON committed Nov 21, 2014 312 313 `````` >>> sequence = fasta_to_one_line(sv40) >>> print amplicon_len(sequence, first_primer, second_primer ) `````` Bertrand NÉRON committed Aug 05, 2014 314 `````` 199 `````` Blaise Li committed Jun 08, 2021 315 316 `````` :download:`amplicon_len.py <_static/code/amplicon_len.py>`. `````` Bertrand NÉRON committed Nov 21, 2014 317 `````` `````` Bertrand NÉRON committed Aug 05, 2014 318 319 320 321 `````` Exercise -------- `````` Blaise Li committed Jun 08, 2021 322 323 324 ``````#. Reverse the following sequence ``"TACCTTCTGAGGCGGAAAGA"`` (don't compute the complement). :: `````` Bertrand NÉRON committed Aug 05, 2014 325 326 `````` >>> "TACCTTCTGAGGCGGAAAGA"[::-1] `````` Blaise Li committed Jun 08, 2021 327 `````` # or `````` Bertrand NÉRON committed Aug 05, 2014 328 `````` >>> s = "TACCTTCTGAGGCGGAAAGA" `````` Blaise Li committed Jun 08, 2021 329 `````` >>> l = list(s) `````` Bertrand NÉRON committed Aug 05, 2014 330 `````` # take care reverse() reverse a list in place (the method do a side effect and return None ) `````` Bertrand NÉRON committed Mar 09, 2017 331 `````` # so if you don't have a object reference on the list you cannot get the reversed list! `````` Bertrand NÉRON committed Aug 05, 2014 332 333 334 `````` >>> l.reverse() >>> print l >>> ''.join(l) `````` Blaise Li committed Jun 08, 2021 335 `````` # or `````` Bertrand NÉRON committed Aug 05, 2014 336 337 `````` >>> rev_s = reversed(s) ''.join(rev_s) `````` Blaise Li committed Jun 08, 2021 338 339 340 341 342 `````` The most efficient way to reverse a string or a list is the way using the slice. .. #. Using the shorter string ``s = 'gaattc'`` draw what happens in memory when you reverse ``s``. `````` Bertrand NÉRON committed Aug 05, 2014 343 344 345 346 `````` Exercise -------- `````` Blaise Li committed Jun 08, 2021 347 348 349 350 ``````| The ``il2_human`` sequence contains 4 cysteins (C) in positions 9, 78, 125, 145. | We want to generate the sequence of a mutant where the cysteins 78 and 125 are replaced by serins (S) | Write the pseudocode, before proposing an implementation: `````` Bertrand NÉRON committed Aug 05, 2014 351 `````` `````` Blaise Li committed Jun 08, 2021 352 ``````We have to take care of the difference between Python string numbering and usual position numbering: `````` Bertrand NÉRON committed Aug 05, 2014 353 354 355 356 357 358 359 `````` | C in seq -> in string | 9 -> 8 | 78 -> 77 | 125 -> 124 | 145 -> 144 `````` Bertrand NÉRON committed Nov 21, 2014 360 ``````| *generate 3 slices from the il2_human* `````` Blaise Li committed Jun 08, 2021 361 ``````| *head <- from the begining and cut between the first cystein and the second* `````` Bertrand NÉRON committed Nov 21, 2014 362 ``````| *body <- include the 2nd and 3rd cystein* `````` Blaise Li committed Jun 08, 2021 363 364 ``````| *tail <- cut after the 3rd cystein until the end* | *replace body cystein by serin* `````` Bertrand NÉRON committed Nov 21, 2014 365 ``````| *make new sequence with head body_mutate tail* `````` Blaise Li committed Jun 08, 2021 366 367 `````` `````` Bertrand NÉRON committed Aug 05, 2014 368 369 ``````:: `````` Blaise Li committed Jun 08, 2021 370 371 372 373 374 375 `````` il2_human = 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFCQSIISTLT' head = il2_human[:77] body = il2_human[77:125] tail = il2_human[126:] body_mutate = body.replace('C', 'S') il2_mutate = head + body_mutate + tail `````` Bertrand NÉRON committed Aug 05, 2014 376 377 378 379 `````` Exercise -------- `````` Blaise Li committed Jun 08, 2021 380 ``````Write a function which: `````` Bertrand NÉRON committed Nov 21, 2014 381 `````` `````` Blaise Li committed Jun 08, 2021 382 383 384 385 386 387 388 ``````* takes a sequence as parameter; * computes the GC%; * and returns it; * displays the results as a "human-readable" micro report like this: ``'The sv40 is 5243 bp length and has 40.80% gc'``. Use the sv40 sequence to test your function. `````` Bertrand NÉRON committed Nov 21, 2014 389 390 391 392 393 394 395 `````` .. literalinclude:: _static/code/gc_percent.py :linenos: :language: python :: `````` Blaise Li committed Jun 08, 2021 396 `````` >>> import sv40 `````` Bertrand NÉRON committed Nov 21, 2014 397 398 `````` >>> import fasta_to_one_line >>> import gc_percent `````` Blaise Li committed Jun 08, 2021 399 `````` >>> `````` Bertrand NÉRON committed Nov 21, 2014 400 401 `````` >>> sequence = fasta_to_one_line(sv40) >>> gc_pc = gc_percent(sequence) `````` Blaise Li committed Jun 08, 2021 402 `````` >>> report = "The sv40 is {0} bp length and has {1:.2%} gc".format(len(sequence), gc_pc) `````` Bertrand NÉRON committed Nov 21, 2014 403 `````` >>> print report `````` Blaise Li committed Jun 08, 2021 404 405 `````` 'The sv40 is 5243 bp length and has 40.80% gc' `````` Bertrand NÉRON committed Nov 21, 2014 406 ``:download:`gc_percent.py <_static/code/gc_percent.py>` . ``