Data_Types.rst 10.3 KB
 Bertrand NÉRON committed Dec 11, 2014 1 2 3 ``````.. sectnum:: :start: 4 `````` Bertrand NÉRON committed Jul 14, 2014 4 5 6 7 8 ``````.. _Data_Types: ********** Data Types ********** `````` Bertrand NÉRON committed Aug 05, 2014 9 10 11 12 13 14 15 16 17 18 19 20 21 22 `````` Exercices ========= Exercise -------- Assume that we execute the following assignment statements: :: width = 17 height = 12.0 delimiter ='.' For each of the following expressions, write the value of the expression and the type (of the value of `````` Bertrand NÉRON committed Nov 21, 2014 23 ``````the expression) and explain. `````` Bertrand NÉRON committed Aug 05, 2014 24 `````` `````` Bertrand NÉRON committed Nov 21, 2014 25 26 27 28 `````` #. width / 2 #. width / 2.0 #. height / 3 #. 1 + 2 * 5 `````` Bertrand NÉRON committed Aug 05, 2014 29 30 31 32 33 34 35 36 37 38 39 40 41 42 `````` Use the Python interpreter to check your answers. :: >>> width = 17 >>> height = 12.0 >>> delimiter ='.' >>> >>> width / 2 8 >>> # both operands are integer so python done an euclidian division and threw out the remainder >>> width / 2.0 8.5 >>> height / 3 4.0 `````` Bertrand NÉRON committed Mar 11, 2019 43 44 `````` >>> # one of the operand is a float (2.0 or height) then python pyhton perform a float division but keep in mind that float numbers are approximation. >>> # if you need precision you need to use Decimal. But operations on Decimal are slow and float offer quite enough precision `````` Bertrand NÉRON committed Aug 05, 2014 45 46 `````` >>> # so we use decimal only if wee need great precision >>> # Euclidian division `````` Bertrand NÉRON committed Mar 11, 2019 47 `````` >>> 2 // 3 `````` Bertrand NÉRON committed Aug 05, 2014 48 49 `````` 0 >>> # float division `````` Bertrand NÉRON committed Mar 11, 2019 50 `````` >>> 2 / 3 `````` Bertrand NÉRON committed Aug 05, 2014 51 `````` 0.6666666666666666 `````` Bertrand NÉRON committed Mar 11, 2019 52 `````` `````` Bertrand NÉRON committed Aug 05, 2014 53 54 55 56 57 `````` Exercise -------- `````` Bertrand NÉRON committed Nov 21, 2014 58 ``````Write a function which take a radius as input and return the volume of a sphere: `````` Bertrand NÉRON committed Aug 05, 2014 59 `````` `````` Bertrand NÉRON committed Nov 21, 2014 60 61 62 63 64 65 66 67 ``````The volume of a sphere with radius r is 4/3 πr\ :sup:`3`. What is the volume of a sphere with radius 5? **Hint**: π is in math module, so to access it you need to import the math module Place the ``import`` statement at the top fo your file. after that, you can use ``math.pi`` everywhere in the file like this:: `````` Bertrand NÉRON committed Aug 05, 2014 68 `````` >>> import math `````` Bertrand NÉRON committed Nov 21, 2014 69 70 71 `````` >>> >>> #do what you need to do >>> math.pi #use math.pi `````` Bertrand NÉRON committed Mar 11, 2019 72 `````` `````` Bertrand NÉRON committed Nov 21, 2014 73 74 75 76 77 78 79 80 81 82 83 ``````.. literalinclude:: _static/code/vol_of_sphere.py :linenos: :language: python :: python -i volume_of_sphere.py >>> vol_of_sphere(5) 523.5987755982989 :download:`vol_of_sphere.py <_static/code/vol_of_sphere.py>` . `````` Bertrand NÉRON committed Aug 05, 2014 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 `````` Exercise -------- Draw what happen in memory when the following statements are executed: :: i = 12 i += 2 .. figure:: _static/figs/augmented_assignment_int.png :width: 400px :alt: set :figclass: align-center :: >>> i = 12 >>> id(i) 33157200 >>> i += 2 >>> id(i) 33157152 and :: s = 'gaa' s = s + 'ttc' .. figure:: _static/figs/augmented_assignment_string.png :width: 400px :alt: set :figclass: align-center :: >>> s = 'gaa' >>> id(s) 139950507582368 >>> s = s+ 'ttc' >>> s 'gaattc' >>> id(s) 139950571818896 when an augmented assignment operator is used on an immutable object is that #. the operation is performed, #. and an object holding the result is created #. and then the target object reference is re-bound to refer to the result object rather than the object it referred to before. So, in the preceding case when the statement ``i += 2`` is encountered, Python computes 1 + 2 , stores the result in a new int object, and then rebinds ``i`` to refer to this new int . And if the original object a was referring to has no more object references referring to it, it will be scheduled for garbage collection. The same mechanism is done with all immutable object included strings. Exercise -------- how to obtain a new sequence which is the 10 times repetition of the this motif : "AGGTCGACCAGATTANTCCG":: >>> s = "AGGTCGACCAGATTANTCCG" >>> s10 = s * 10 Exercise -------- create a representation in fasta format of following sequence : .. note:: A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the description (optional). There should be no space between the ">" and the first letter of the identifier. The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence. :: `````` Bertrand NÉRON committed Jun 07, 2021 162 `````` name = "sp|P60568|IL2_HUMAN" `````` Bertrand NÉRON committed Aug 05, 2014 163 164 165 166 167 168 169 `````` comment = "Interleukin-2 OS=Homo sapiens GN=IL2 PE=1 SV=1" sequence = """MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRML TFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSE TTFMCEYADETATIVEFLNRWITFCQSIISTLT""" `````` Blaise Li committed Jun 07, 2021 170 `````` >>> s = ">" + name + " " + comment + '\n' + sequence `````` Bertrand NÉRON committed Aug 05, 2014 171 `````` or `````` Blaise Li committed Jun 07, 2021 172 `````` >>> s = ">{name} {comment}\n{sequence}".format(id=id, comment=comment, sequence=sequence) `````` Bertrand NÉRON committed Jun 07, 2021 173 `````` or `````` Blaise Li committed Jun 07, 2021 174 175 176 `````` >>> s = f">{name} {comment}\n{sequence}" `````` Bertrand NÉRON committed Aug 05, 2014 177 178 179 ``````Exercise -------- `````` Bertrand NÉRON committed Nov 25, 2014 180 181 ``````For the following exercise use the python file :download:`sv40 in fasta <_static/code/sv40_file.py>` which is a python file with the sequence of sv40 in fasta format already embeded, and use python -i sv40_file.py to work. `````` Bertrand NÉRON committed Aug 05, 2014 182 `````` `````` Blaise Li committed Jun 08, 2021 183 ``````How long is the sv40 in bp? `````` Bertrand NÉRON committed Aug 05, 2014 184 185 186 ``````Hint : the fasta header is 61bp long. (http://www.ncbi.nlm.nih.gov/nuccore/J02400.1) `````` Bertrand NÉRON committed Nov 21, 2014 187 188 189 190 191 192 193 ``````pseudocode write a function ``fasta_to_one_line`` that return a sequence as a string without header or any non sequence characters pseudocode: `````` Bertrand NÉRON committed Nov 23, 2014 194 ``````| *function fasta_to_one_line(seq)* `````` Bertrand NÉRON committed Nov 21, 2014 195 196 197 198 199 200 201 202 203 204 ``````| *header_end_at <- find the first return line character* | *raw_seq <- remove header from sequence* | *raw_seq <- remove non sequence chars* | *return raw_seq* .. literalinclude:: _static/code/fasta_to_one_line.py :linenos: :language: python `````` Blaise Li committed Jun 08, 2021 205 206 `````` :download:`fasta_to_one_line.py <_static/code/fasta_to_one_line.py>`. `````` Bertrand NÉRON committed Nov 21, 2014 207 208 209 210 `````` :: python `````` Bertrand NÉRON committed Nov 25, 2014 211 `````` >>> import sv40_file `````` Bertrand NÉRON committed Nov 21, 2014 212 213 `````` >>> import fasta_to_one_line >>> `````` Blaise Li committed Jun 08, 2021 214 `````` >>> sv40_seq = fasta_to_one_line(sv40_file.sv40_fasta) `````` Bertrand NÉRON committed Nov 25, 2014 215 `````` >>> print len(sv40_seq) `````` Bertrand NÉRON committed Aug 05, 2014 216 217 `````` 5243 `````` Blaise Li committed Jun 08, 2021 218 ``````Consider the following restriction enzymes: `````` Bertrand NÉRON committed Nov 25, 2014 219 `````` `````` Blaise Li committed Jun 08, 2021 220 221 222 223 ``````* BamHI (ggatcc) * EcorI (gaattc) * HindIII (aagctt) * SmaI (cccggg) `````` Bertrand NÉRON committed Nov 21, 2014 224 `````` `````` Blaise Li committed Jun 08, 2021 225 226 227 ``````For each of them, tell whether it has recogition sites in sv40 (just answer by True or False). :: `````` Bertrand NÉRON committed Aug 05, 2014 228 229 230 231 232 233 234 235 236 237 `````` >>> "ggatcc".upper() in sv40_sequence True >>> "gaattc".upper() in sv40_sequence True >>> "aagctt".upper() in sv40_sequence True >>> "cccggg".upper() in sv40_sequence False `````` Blaise Li committed Jun 08, 2021 238 239 240 ``````For the enzymes which have a recognition site can you give their positions? :: `````` Bertrand NÉRON committed Aug 05, 2014 241 242 243 244 245 `````` >>> sv40_sequence = sv40_sequence.lower() >>> sv40_sequence.find("ggatcc") 2532 >>> # remind the string are numbered from 0 `````` Blaise Li committed Jun 08, 2021 246 247 `````` >>> 2532 + 1 2533 `````` Bertrand NÉRON committed Aug 05, 2014 248 249 250 251 252 253 254 255 `````` >>> # the recognition motif of BamHI start at 2533 >>> sv40_sequence.find("gaattc") 1781 >>> # EcorI -> 1782 >>> sv40_sequence.find("aagctt") 1045 >>> # HindIII -> 1046 `````` Blaise Li committed Jun 08, 2021 256 257 258 259 260 ``````Is there only one site in sv40 per enzyme? The ``find`` method gives the index of the first occurrence or -1 if the substring is not found. So we can not determine the number of occurrences of a site only with the ``find`` method. `````` Bertrand NÉRON committed Mar 11, 2019 261 ``````We can know how many sites are present with the ``count`` method. `````` Blaise Li committed Jun 08, 2021 262 263 264 265 266 267 268 269 270 271 272 273 274 `````` :: >>> sv40_seq.count("ggatcc") 1 >>> sv40_seq.count("gaattc") 1 >>> sv40_seq.count("aagctt") 6 >>> sv40_seq.count("cccggg") 0 We will see how to determine all occurrences of restriction sites when we learn looping and conditions. `````` Bertrand NÉRON committed Aug 05, 2014 275 276 277 278 279 280 281 `````` Exercise -------- We want to perform a PCR on sv40, can you give the length and the sequence of the amplicon? `````` Bertrand NÉRON committed Nov 21, 2014 282 283 284 ``````Write a function which have 3 parameters ``sequence``, ``primer_1`` and ``primer_2`` * *We consider only the cases where primer_1 and primer_2 are present in sequence* `````` Bertrand NÉRON committed Mar 09, 2017 285 ``````* *to simplify the exercise, the 2 primers can be read directly on the sv40 sequence.* `````` Bertrand NÉRON committed Nov 21, 2014 286 287 288 289 `````` test you algorithm with the following primers | primer_1 : 5' CGGGACTATGGTTGCTGACT 3' `````` Bertrand NÉRON committed Mar 09, 2017 290 ``````| primer_2 : 5' TCTTTCCGCCTCAGAAGGTA 3' `````` Bertrand NÉRON committed Nov 21, 2014 291 292 293 294 295 296 297 298 299 300 301 302 303 `````` Write the pseudocode before to implement it. | *function amplicon_len(sequence primer_1, primer_2)* | *pos_1 <- find position of primer_1 in sequence* | *pos_2 <- find position of primer_2 in sequence* | *amplicon length <- pos_2 + length(primer_2) - pos_1* | *return amplicon length* .. literalinclude:: _static/code/amplicon_len.py :linenos: :language: python `````` Bertrand NÉRON committed Aug 05, 2014 304 305 306 `````` :: `````` Bertrand NÉRON committed Nov 21, 2014 307 308 309 310 311 `````` >>> import sv40 >>> import fasta_to_one_line >>> >>> sequence = fasta_to_one_line(sv40) >>> print amplicon_len(sequence, first_primer, second_primer ) `````` Bertrand NÉRON committed Aug 05, 2014 312 `````` 199 `````` Bertrand NÉRON committed Nov 21, 2014 313 314 315 `````` :download:`amplicon_len.py <_static/code/amplicon_len.py>` . `````` Bertrand NÉRON committed Aug 05, 2014 316 317 318 319 320 321 322 323 324 325 326 `````` Exercise -------- reverse the following sequence "TACCTTCTGAGGCGGAAAGA" (don't compute the complement): :: >>> "TACCTTCTGAGGCGGAAAGA"[::-1] or >>> s = "TACCTTCTGAGGCGGAAAGA" >>> l = list(s) # take care reverse() reverse a list in place (the method do a side effect and return None ) `````` Bertrand NÉRON committed Mar 09, 2017 327 `````` # so if you don't have a object reference on the list you cannot get the reversed list! `````` Bertrand NÉRON committed Aug 05, 2014 328 329 330 331 332 333 334 335 336 337 338 339 `````` >>> l.reverse() >>> print l >>> ''.join(l) or >>> rev_s = reversed(s) ''.join(rev_s) The most efficient way to reverse a string or a list is the way using the slice. Exercise -------- `````` Bertrand NÉRON committed Nov 21, 2014 340 341 342 ``````| The il2_human contains 4 cysteins (C) in positions 9, 78, 125, 145. | We want to generate the sequence of a mutatnt were the cysteins 78 and 125 are replaced by serins (S) | Write the pseudocode, before to propose an implementation: `````` Bertrand NÉRON committed Aug 05, 2014 343 `````` `````` Bertrand NÉRON committed Nov 21, 2014 344 ``````We have to take care of the string numbered vs sequence numbered: `````` Bertrand NÉRON committed Aug 05, 2014 345 346 347 348 349 350 351 `````` | C in seq -> in string | 9 -> 8 | 78 -> 77 | 125 -> 124 | 145 -> 144 `````` Bertrand NÉRON committed Nov 21, 2014 352 353 354 355 356 357 358 359 360 361 ``````| *generate 3 slices from the il2_human* | *head <- from the begining and cut between the first cytein and the second* | *body <- include the 2nd and 3rd cystein* | *tail <- cut after the 3rd cystein until the end* | *replace body cystein by serin* | *make new sequence with head body_mutate tail* il2_human = 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFCQSIISTLT' `````` Bertrand NÉRON committed Aug 05, 2014 362 363 364 365 366 367 368 369 370 371 372 373 `````` :: head = il2_human[:77] body = il2_human[77:125] tail = il2_human[126:] body_mutate = body.replace('C', 'S') il2_mutate = head + body_mutate + tail Exercise -------- `````` Bertrand NÉRON committed Nov 21, 2014 374 375 ``````Write a function `````` Bertrand NÉRON committed Mar 09, 2017 376 ``````* which take a sequence as parameter `````` Bertrand NÉRON committed Nov 21, 2014 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 ``````* compute the GC% * and return it * display the results readable for human as a micro report like this: 'the sv40 is 5243 bp length and have 40.80% gc' use sv40 sequence to test your function. .. literalinclude:: _static/code/gc_percent.py :linenos: :language: python :: >>> import sv40 >>> import fasta_to_one_line >>> import gc_percent >>> >>> sequence = fasta_to_one_line(sv40) >>> gc_pc = gc_percent(sequence) `````` Bertrand NÉRON committed Mar 11, 2019 396 `````` >>> report = "the sv40 is {0} bp length and have {1:.2%} gc".format(len(sequence), gc_pc) `````` Bertrand NÉRON committed Nov 21, 2014 397 398 399 400 `````` >>> print report 'the sv40 is 5243 bp length and have 40.80% gc' :download:`gc_percent.py <_static/code/gc_percent.py>` . ``````