Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
hub-courses
python_one_week_4_biologists_solutions
Commits
2e85f95b
Unverified
Commit
2e85f95b
authored
Jun 08, 2021
by
Bertrand NÉRON
Browse files
fix typos
parent
9cd31ca1
Changes
4
Hide whitespace changes
Inline
Side-by-side
source/Collection_Data_Types.rst
View file @
2e85f95b
...
...
@@ -170,7 +170,11 @@ from the list l = [1, 2, 3, 4, 5, 6, 7, 8, 9] generate 2 lists l1 containing all
l1 = l[::2]
l2 = l[1::2]
or ::
even = [item for item in l if item % 2 == 0]
odd = [item for item in l if item % 2 != 0]
Exercise
--------
...
...
@@ -253,7 +257,7 @@ implementation:
>>> uniqify(l)
[1, 2, 3, 4, 5, 7, 8, 9]
:download:`
codons_itertools
.py <_static/code/
codons_itertools
.py>` .
:download:`
uniqify
.py <_static/code/
uniqify
.py>` .
second implementation:
""""""""""""""""""""""
...
...
@@ -262,7 +266,7 @@ The problem with the first implementation come from the line 4.
Remember that the membership operator uses a linear search for list, which can be slow for very large collections.
If we plan to use ``uniqify`` with large list we should find a better algorithm.
In the specification we can read that uniqify can work *regardless the order of the resulting list*.
So we can use the specif
y
city of set ::
So we can use the specif
i
city of set ::
>>> list(set(l))
...
...
@@ -330,13 +334,13 @@ Compare the pseudocode of each of them and implement the fastest one. ::
acggcaacatggctggccagtgggctctgagaggagaaagtccagtggatgctcttggtctggttcgtgagcgcaacaca"""
In the first al
o
grithm.
In the first alg
o
rithm.
| we first compute all kmers we generate 4\ :sup:`kmer length`
| then we count the occurence of each kmer in the sequence
| so for each kmer we read all the sequence so the algorith is in O( 4\ :sup:`kmer length` * ``sequence length``)
| then we count the occur
r
ence of each kmer in the sequence
| so for each kmer we read all the sequence so the algorith
m
is in O( 4\ :sup:`kmer length` * ``sequence length``)
| In the secon algorithm we read the sequence only once
| In the secon
d
algorithm we read the sequence only once
| So the algorithm is in O(sequence length)
...
...
@@ -441,14 +445,14 @@ pseudocode:
other solution
""""""""""""""
python provide an inter
r
esting method for our problem.
python provide an interesting method for our problem.
The ``translate`` method work on string and need a parameter which is a object
that can do the correspondance between characters in old string a the new one.
``maketrans`` is a function in module ``string`` that allow us to build this object.
``maketrans`` take 2 arguments, two strings, the first string contains the characters
to change, the second string the corresponding characters in the new string.
Thus the two strings **must** have the same leng
h
t. The correspondance between
the characters to change and their new values is made in funtion of th
i
er position.
Thus the two strings **must** have the same lengt
h
. The correspondance between
the characters to change and their new values is made in fun
c
tion of the
i
r position.
the first character of the first string will be replaced by the first character of the second string,
the second character of the first string will be replaced by the second character of the second string, on so on.
So we can write the reverse complement without loop.
...
...
source/_static/code/kmer.py
View file @
2e85f95b
...
...
@@ -7,6 +7,6 @@ def get_kmer_occurences(seq, kmer_len):
kmers
=
{}
stop
=
len
(
seq
)
-
kmer_len
for
i
in
range
(
stop
+
1
):
kmer
=
s
[
i
:
i
+
kmer_len
]
kmer
=
s
eq
[
i
:
i
+
kmer_len
]
kmers
[
kmer
]
=
kmers
.
get
(
kmer
,
0
)
+
1
return
kmers
.
items
()
\ No newline at end of file
source/_static/code/kmer_2.py
View file @
2e85f95b
import
collections
from
operator
import
itemgetter
def
get_kmer_occurences
(
seq
,
kmer_len
):
"""
return a list of tuple
...
...
@@ -9,9 +10,8 @@ def get_kmer_occurences(seq, kmer_len):
kmers
=
collections
.
defaultdict
(
int
)
stop
=
len
(
seq
)
-
kmer_len
for
i
in
range
(
stop
+
1
):
kmer
=
s
[
i
:
i
+
kmer_len
]
kmer
=
s
eq
[
i
:
i
+
kmer_len
]
kmers
[
kmer
]
+=
1
kmers
=
kmers
.
items
()
kmers
.
sort
(
key
=
itemgetter
(
1
),
reverse
=
True
)
kmers
.
sort
(
key
=
itemgetter
(
1
),
reverse
=
True
)
return
kmers
\ No newline at end of file
source/_static/code/rev_comp.py
View file @
2e85f95b
...
...
@@ -4,10 +4,10 @@ def rev_comp(seq):
return the reverse complement of seq
the sequence must be in lower case
"""
complement
=
{
'a'
:
't'
,
'c'
:
'g'
,
'g'
:
'c'
,
't'
:
'a'
}
complement
=
{
'a'
:
't'
,
'c'
:
'g'
,
'g'
:
'c'
,
't'
:
'a'
}
rev_seq
=
seq
[::
-
1
]
rev_comp
=
''
for
nt
in
rev_seq
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment