diff --git a/source/Data_Types.rst b/source/Data_Types.rst index 8a78be73b8fa6c9f55c957e68cf25d3d92aa27ae..dc077558a8901080e2c7d5a67ca47cfa18d5ca51 100644 --- a/source/Data_Types.rst +++ b/source/Data_Types.rst @@ -277,25 +277,26 @@ We will see how to determine all occurrences of restriction sites when we learn Exercise -------- -We want to perform a PCR on sv40, can you give the length and the sequence of the amplicon? +We want to perform a PCR on sv40. Can you give the length and the sequence of the amplicon? -Write a function which have 3 parameters ``sequence``, ``primer_1`` and ``primer_2`` +Write a function which has 3 parameters ``sequence``, ``primer_1`` and ``primer_2`` and returns the amplicon length. -* *We consider only the cases where primer_1 and primer_2 are present in sequence* -* *to simplify the exercise, the 2 primers can be read directly on the sv40 sequence.* +* *We consider only the cases where primer_1 and primer_2 are present in the sequence.* +* *To simplify the exercise, the 2 primers can be read directly in the sv40 sequence (i.e. no need to reverse-complement).* -test you algorithm with the following primers +Test you algorithm with the following primers: | primer_1 : 5' CGGGACTATGGTTGCTGACT 3' | primer_2 : 5' TCTTTCCGCCTCAGAAGGTA 3' -Write the pseudocode before to implement it. +Write the function in pseudocode before implementing it. + | *function amplicon_len(sequence primer_1, primer_2)* | *pos_1 <- find position of primer_1 in sequence* | *pos_2 <- find position of primer_2 in sequence* | *amplicon length <- pos_2 + length(primer_2) - pos_1* -| *return amplicon length* +| *return amplicon length* .. literalinclude:: _static/code/amplicon_len.py @@ -304,44 +305,50 @@ Write the pseudocode before to implement it. :: - >>> import sv40 + >>> import sv40 >>> import fasta_to_one_line - >>> + >>> >>> sequence = fasta_to_one_line(sv40) >>> print amplicon_len(sequence, first_primer, second_primer ) 199 - -:download:`amplicon_len.py <_static/code/amplicon_len.py>` . + +:download:`amplicon_len.py <_static/code/amplicon_len.py>`. Exercise -------- -reverse the following sequence "TACCTTCTGAGGCGGAAAGA" (don't compute the complement): :: +#. Reverse the following sequence ``"TACCTTCTGAGGCGGAAAGA"`` (don't compute the complement). + +:: >>> "TACCTTCTGAGGCGGAAAGA"[::-1] - or + # or >>> s = "TACCTTCTGAGGCGGAAAGA" - >>> l = list(s) + >>> l = list(s) # take care reverse() reverse a list in place (the method do a side effect and return None ) # so if you don't have a object reference on the list you cannot get the reversed list! >>> l.reverse() >>> print l >>> ''.join(l) - or + # or >>> rev_s = reversed(s) ''.join(rev_s) - - The most efficient way to reverse a string or a list is the way using the slice. + + The most efficient way to reverse a string or a list is the way using the slice. + +.. #. Using the shorter string ``s = 'gaattc'`` draw what happens in memory when you reverse ``s``. + Exercise -------- -| The il2_human contains 4 cysteins (C) in positions 9, 78, 125, 145. -| We want to generate the sequence of a mutatnt were the cysteins 78 and 125 are replaced by serins (S) -| Write the pseudocode, before to propose an implementation: +| The ``il2_human`` sequence contains 4 cysteins (C) in positions 9, 78, 125, 145. +| We want to generate the sequence of a mutant where the cysteins 78 and 125 are replaced by serins (S) +| Write the pseudocode, before proposing an implementation: + -We have to take care of the string numbered vs sequence numbered: +We have to take care of the difference between Python string numbering and usual position numbering: | C in seq -> in string | 9 -> 8 @@ -350,36 +357,34 @@ We have to take care of the string numbered vs sequence numbered: | 145 -> 144 | *generate 3 slices from the il2_human* -| *head <- from the begining and cut between the first cytein and the second* +| *head <- from the begining and cut between the first cystein and the second* | *body <- include the 2nd and 3rd cystein* -| *tail <- cut after the 3rd cystein until the end* -| *replace body cystein by serin* +| *tail <- cut after the 3rd cystein until the end* +| *replace body cystein by serin* | *make new sequence with head body_mutate tail* - -il2_human = -'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFCQSIISTLT' - - + + :: - head = il2_human[:77] - body = il2_human[77:125] - tail = il2_human[126:] - body_mutate = body.replace('C', 'S') - il2_mutate = head + body_mutate + tail + il2_human = 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFCQSIISTLT' + head = il2_human[:77] + body = il2_human[77:125] + tail = il2_human[126:] + body_mutate = body.replace('C', 'S') + il2_mutate = head + body_mutate + tail Exercise -------- -Write a function +Write a function which: -* which take a sequence as parameter -* compute the GC% -* and return it -* display the results readable for human as a micro report like this: - 'the sv40 is 5243 bp length and have 40.80% gc' - -use sv40 sequence to test your function. +* takes a sequence as parameter; +* computes the GC%; +* and returns it; +* displays the results as a "human-readable" micro report like this: + ``'The sv40 is 5243 bp length and has 40.80% gc'``. + +Use the sv40 sequence to test your function. .. literalinclude:: _static/code/gc_percent.py :linenos: @@ -387,14 +392,14 @@ use sv40 sequence to test your function. :: - >>> import sv40 + >>> import sv40 >>> import fasta_to_one_line >>> import gc_percent - >>> + >>> >>> sequence = fasta_to_one_line(sv40) >>> gc_pc = gc_percent(sequence) - >>> report = "the sv40 is {0} bp length and have {1:.2%} gc".format(len(sequence), gc_pc) + >>> report = "The sv40 is {0} bp length and has {1:.2%} gc".format(len(sequence), gc_pc) >>> print report - 'the sv40 is 5243 bp length and have 40.80% gc' - + 'The sv40 is 5243 bp length and has 40.80% gc' + :download:`gc_percent.py <_static/code/gc_percent.py>` .