Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
hub-courses
python_one_week_4_biologists_solutions
Commits
561097d6
Commit
561097d6
authored
Jun 08, 2021
by
Blaise Li
Browse files
Minor language and format updates.
parent
9f15a02a
Pipeline
#58073
passed with stages
in 13 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
source/Data_Types.rst
View file @
561097d6
...
...
@@ -277,25 +277,26 @@ We will see how to determine all occurrences of restriction sites when we learn
Exercise
--------
We want to perform a PCR on sv40
, c
an you give the length and the sequence of the amplicon?
We want to perform a PCR on sv40
. C
an you give the length and the sequence of the amplicon?
Write a function which ha
ve
3 parameters ``sequence``, ``primer_1`` and ``primer_2``
Write a function which ha
s
3 parameters ``sequence``, ``primer_1`` and ``primer_2``
and returns the amplicon length.
* *We consider only the cases where primer_1 and primer_2 are present in sequence*
* *
t
o simplify the exercise, the 2 primers can be read directly
o
n the sv40 sequence.*
* *We consider only the cases where primer_1 and primer_2 are present in
the
sequence
.
*
* *
T
o simplify the exercise, the 2 primers can be read directly
i
n the sv40 sequence
(i.e. no need to reverse-complement)
.*
t
est you algorithm with the following primers
T
est you algorithm with the following primers
:
| primer_1 : 5' CGGGACTATGGTTGCTGACT 3'
| primer_2 : 5' TCTTTCCGCCTCAGAAGGTA 3'
Write the pseudocode before to implement it.
Write the function in pseudocode before implementing it.
| *function amplicon_len(sequence primer_1, primer_2)*
| *pos_1 <- find position of primer_1 in sequence*
| *pos_2 <- find position of primer_2 in sequence*
| *amplicon length <- pos_2 + length(primer_2) - pos_1*
| *return amplicon length*
| *return amplicon length*
.. literalinclude:: _static/code/amplicon_len.py
...
...
@@ -304,44 +305,50 @@ Write the pseudocode before to implement it.
::
>>> import sv40
>>> import sv40
>>> import fasta_to_one_line
>>>
>>>
>>> sequence = fasta_to_one_line(sv40)
>>> print amplicon_len(sequence, first_primer, second_primer )
199
:download:`amplicon_len.py <_static/code/amplicon_len.py>`
.
:download:`amplicon_len.py <_static/code/amplicon_len.py>`
.
Exercise
--------
reverse the following sequence "TACCTTCTGAGGCGGAAAGA" (don't compute the complement): ::
#. Reverse the following sequence ``"TACCTTCTGAGGCGGAAAGA"`` (don't compute the complement).
::
>>> "TACCTTCTGAGGCGGAAAGA"[::-1]
or
#
or
>>> s = "TACCTTCTGAGGCGGAAAGA"
>>> l = list(s)
>>> l = list(s)
# take care reverse() reverse a list in place (the method do a side effect and return None )
# so if you don't have a object reference on the list you cannot get the reversed list!
>>> l.reverse()
>>> print l
>>> ''.join(l)
or
#
or
>>> rev_s = reversed(s)
''.join(rev_s)
The most efficient way to reverse a string or a list is the way using the slice.
The most efficient way to reverse a string or a list is the way using the slice.
.. #. Using the shorter string ``s = 'gaattc'`` draw what happens in memory when you reverse ``s``.
Exercise
--------
| The il2_human contains 4 cysteins (C) in positions 9, 78, 125, 145.
| We want to generate the sequence of a mutatnt were the cysteins 78 and 125 are replaced by serins (S)
| Write the pseudocode, before to propose an implementation:
| The ``il2_human`` sequence contains 4 cysteins (C) in positions 9, 78, 125, 145.
| We want to generate the sequence of a mutant where the cysteins 78 and 125 are replaced by serins (S)
| Write the pseudocode, before proposing an implementation:
We have to take care of the
string numbered vs sequence
number
ed
:
We have to take care of the
difference between Python string numbering and usual position
number
ing
:
| C in seq -> in string
| 9 -> 8
...
...
@@ -350,36 +357,34 @@ We have to take care of the string numbered vs sequence numbered:
| 145 -> 144
| *generate 3 slices from the il2_human*
| *head <- from the begining and cut between the first cytein and the second*
| *head <- from the begining and cut between the first cy
s
tein and the second*
| *body <- include the 2nd and 3rd cystein*
| *tail <- cut after the 3rd cystein until the end*
| *replace body cystein by serin*
| *tail <- cut after the 3rd cystein until the end*
| *replace body cystein by serin*
| *make new sequence with head body_mutate tail*
il2_human =
'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFCQSIISTLT'
::
head = il2_human[:77]
body = il2_human[77:125]
tail = il2_human[126:]
body_mutate = body.replace('C', 'S')
il2_mutate = head + body_mutate + tail
il2_human = 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFCQSIISTLT'
head = il2_human[:77]
body = il2_human[77:125]
tail = il2_human[126:]
body_mutate = body.replace('C', 'S')
il2_mutate = head + body_mutate + tail
Exercise
--------
Write a function
Write a function
which:
*
which
take a sequence as parameter
* compute the GC%
* and return it
* display the results
readable for human as a
micro report like this:
't
he sv40 is 5243 bp length and ha
ve
40.80% gc'
us
e sv40 sequence to test your function.
* take
s
a sequence as parameter
;
* compute
s
the GC%
;
* and return
s
it
;
* display
s
the results
as a "human-readable"
micro report like this:
``'T
he sv40 is 5243 bp length and ha
s
40.80% gc'
``.
Use th
e sv40 sequence to test your function.
.. literalinclude:: _static/code/gc_percent.py
:linenos:
...
...
@@ -387,14 +392,14 @@ use sv40 sequence to test your function.
::
>>> import sv40
>>> import sv40
>>> import fasta_to_one_line
>>> import gc_percent
>>>
>>>
>>> sequence = fasta_to_one_line(sv40)
>>> gc_pc = gc_percent(sequence)
>>> report = "
t
he sv40 is {0} bp length and ha
ve
{1:.2%} gc".format(len(sequence), gc_pc)
>>> report = "
T
he sv40 is {0} bp length and ha
s
{1:.2%} gc".format(len(sequence), gc_pc)
>>> print report
'
t
he sv40 is 5243 bp length and ha
ve
40.80% gc'
'
T
he sv40 is 5243 bp length and ha
s
40.80% gc'
:download:`gc_percent.py <_static/code/gc_percent.py>` .
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment