Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
P
python_one_week_4_biologists_solutions
Manage
Activity
Members
Labels
Plan
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
hub-courses
python_one_week_4_biologists_solutions
Commits
09aaf26a
Commit
09aaf26a
authored
10 years ago
by
Bertrand NÉRON
Browse files
Options
Downloads
Patches
Plain Diff
add source for rebase parser
parent
01528585
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
source/Input_Output.rst
+33
-7
33 additions, 7 deletions
source/Input_Output.rst
source/_static/code/rebase.py
+39
-0
39 additions, 0 deletions
source/_static/code/rebase.py
with
72 additions
and
7 deletions
source/Input_Output.rst
+
33
−
7
View file @
09aaf26a
...
...
@@ -32,7 +32,13 @@ Write a function which take the path of a file in rebase format
and return in a dictionnary the collection of the enzyme contains in the file.
The sequence of the binding site must be cleaned up.
:download:`rebase_light.txt <_static/data/rebase_light.txt>` .
use the file :download:`rebase_light.txt <_static/data/rebase_light.txt>` to test your code.
.. literalinclude:: _static/code/rebase.py
:linenos:
:language: python
:download:`rebase.py <_static/code/rebase.py>` .
Exercise
--------
...
...
@@ -61,23 +67,43 @@ solution 1
:linenos:
:language: python
:download:`fasta_iterator.py <_static/code/multiple_fasta_reader.py>`
:download:`multiple_fasta_reader.py <_static/code/multiple_fasta_reader.py>`
solution 2
^^^^^^^^^^
.. literalinclude:: _static/code/multiple_fasta_reader2.py
:linenos:
:language: python
:download:`multiple_fasta_reader2.py <_static/code/multiple_fasta_reader2.py>`
solution 3
^^^^^^^^^^
.. literalinclude:: _static/code/fasta_iterator.py
:linenos:
:language: python
:download:`fasta_iterator.py <_static/code/fasta_iterator.py>` .
The second version is an iterator. Thus it retrun sequence by sequence the advantage of this version.
If the file contains lot of sequences you have not to load all the file in memory.
With the first version, we have to load all sequences before to treat them.
if the file is huge (>G0) it can be a problem.
The third version allow to red sequences one by one.
To do that we have to open the file outside the reader function
The fasta format is very convenient for human but not for parser.
The end of a sequence is indicated by the end of file or the begining of a new one.
So with this version we have play with the cursor to place the cursor backward
when we encouter a new sequence. then the cursor is placed at the right place
for the next sequence.
The third version is an iterator and use generator.
generators are functions which keep a state between to calls.
generators does not use return to return a value but the keyword yield.
Thus this implementation retrun sequence by sequence without to play with the cursor.
You can call this function and put in in a loop or call next.
Work with the sequence and pass to the next sequence on so on.
for instance : ::
for instance
which is a very convenient way to use it
: ::
for seq in fasta_iter('my_fast_file.fasta'):
print seq
...
...
This diff is collapsed.
Click to expand it.
source/_static/code/rebase.py
0 → 100644
+
39
−
0
View file @
09aaf26a
def
rebase_parser
(
rebase_file
):
"""
:param rebase_file: the rebase file to parse
:type rebase_file: file object
:return: at each call return a tuple (str enz name, str binding site)
:rtype: iterator
"""
def
clean_seq
(
seq
):
"""
remove each characters which are not a base
"""
clean_seq
=
''
for
char
in
seq
:
if
char
in
'
ACGT
'
:
clean_seq
+=
char
return
clean_seq
for
line
in
rebase_file
:
fields
=
line
.
split
()
#fields = fields.split()
name
=
fields
[
0
]
seq
=
clean_seq
(
fields
[
2
])
yield
(
name
,
seq
)
if
__name__
==
'
__main__
'
:
import
sys
import
os.path
if
len
(
sys
.
argv
)
!=
2
:
sys
.
exit
(
"
usage multiple_fasta fasta_path
"
)
rebase_path
=
sys
.
argv
[
1
]
if
not
os
.
path
.
exists
(
rebase_path
):
sys
.
exit
(
"
No such file: {}
"
.
format
(
rebase_path
))
with
open
(
rebase_path
,
'
r
'
)
as
rebase_input
:
for
enz
in
rebase_parser
(
rebase_input
):
print
enz
\ No newline at end of file
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment