Skip to content
Snippets Groups Projects
Commit 04148644 authored by Blaise Li's avatar Blaise Li
Browse files

Minor update in processing steps description.

Ribo-seq is still not up-to-date.
parent f6700ce5
No related branches found
No related tags found
No related merge requests found
...@@ -98,6 +98,8 @@ paper = ref_info["paper"] ...@@ -98,6 +98,8 @@ paper = ref_info["paper"]
submission_dir = ref_info["NCBI_submitter"] submission_dir = ref_info["NCBI_submitter"]
gather_processed_data = config.get("gather_processed_tables", False) gather_processed_data = config.get("gather_processed_tables", False)
if not gather_processed_data:
print("Only raw files will be gathered.")
data_info = config["data"] data_info = config["data"]
LIBTYPES = list(data_info.keys()) LIBTYPES = list(data_info.keys())
......
...@@ -94,7 +94,7 @@ Ribo-seq: ...@@ -94,7 +94,7 @@ Ribo-seq:
The 5' and 3' 4 nt UMIs were removed from the trimmed reads using cutadapt (version 1.18) with options -u 4 and -u -4 The 5' and 3' 4 nt UMIs were removed from the trimmed reads using cutadapt (version 1.18) with options -u 4 and -u -4
After removing UMIs, the reads from 28 to 30 nt were selected using bioawk version 20110810 (git commit fd40150b7c557da45e781a999d372abbc634cc21) After removing UMIs, the reads from 28 to 30 nt were selected using bioawk version 20110810 (git commit fd40150b7c557da45e781a999d372abbc634cc21)
The size-selected reads were mapped on the C. elegans genome (WBcel235) using bowtie2 (version 2.3.4.3) with options -L 6 -i S,1,0.8 -N 0 The size-selected reads were mapped on the C. elegans genome (WBcel235) using bowtie2 (version 2.3.4.3) with options -L 6 -i S,1,0.8 -N 0
Mapped and remapped reads were used to estimate the abundance of structural RNAs using featureCounts (version 1.6.3) with options -O -s 1 --fracOverlap 1 and annotations corresponding to tRNA, snRNA, snoRNA, rRNA or RNA (as annotated in the iGenome distribution of WBcel235 obtained at ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Caenorhabditis_elegans/Ensembl/WBcel235/Caenorhabditis_elegans_Ensembl_WBcel235.tar.gz) Mapped reads were used to estimate the abundance of structural RNAs using featureCounts (version 1.6.3) with options -O -s 1 --fracOverlap 1 and annotations corresponding to tRNA, snRNA, snoRNA, rRNA or RNA (as annotated in the iGenome distribution of WBcel235 obtained at ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Caenorhabditis_elegans/Ensembl/WBcel235/Caenorhabditis_elegans_Ensembl_WBcel235.tar.gz)
The abundance of non-structural RNAs was estimated by subtracting the above counts from the number of mapped and remapped reads. The abundance of non-structural RNAs was estimated by subtracting the above counts from the number of mapped and remapped reads.
Initially mapped reads were classified using a custom python program according to their length, composition and on the annotations on which they mapped. Reads that didn't match miRNA and piRNA annotations were considered as potential endo-siRNAs. Initially mapped reads were classified using a custom python program according to their length, composition and on the annotations on which they mapped. Reads that didn't match miRNA and piRNA annotations were considered as potential endo-siRNAs.
The potential endo-siRNAs of size 21 to 23 nt that started with G were classified as \"si_22G\" if they mapped antisense to annotation belonging to the following categories: DNA transposons, RNA transposons, satellites, simple repeats (as annotated in http://hgdownload.cse.ucsc.edu/goldenPath/ce11/database/rmsk.txt.gz) or pseudogene or protein-coding genes (as annotated in the iGenome distribution of WBcel235 obtained at ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Caenorhabditis_elegans/Ensembl/WBcel235/Caenorhabditis_elegans_Ensembl_WBcel235.tar.gz) The potential endo-siRNAs of size 21 to 23 nt that started with G were classified as \"si_22G\" if they mapped antisense to annotation belonging to the following categories: DNA transposons, RNA transposons, satellites, simple repeats (as annotated in http://hgdownload.cse.ucsc.edu/goldenPath/ce11/database/rmsk.txt.gz) or pseudogene or protein-coding genes (as annotated in the iGenome distribution of WBcel235 obtained at ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Caenorhabditis_elegans/Ensembl/WBcel235/Caenorhabditis_elegans_Ensembl_WBcel235.tar.gz)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment