Commit ecef7618 authored by Marie Bourdon's avatar Marie Bourdon
Browse files

mark_estmap in tab

parent 45e067cd
...@@ -13,3 +13,5 @@ ...@@ -13,3 +13,5 @@
/home/marie/Documents/stuart_package/stuart/R/tab_mark.R="4023F220" /home/marie/Documents/stuart_package/stuart/R/tab_mark.R="4023F220"
/home/marie/Documents/stuart_package/stuart/R/write_rqtl.R="DA3DDAC1" /home/marie/Documents/stuart_package/stuart/R/write_rqtl.R="DA3DDAC1"
/home/marie/Documents/stuart_package/stuart/vignettes/stuart.Rmd="568884A2" /home/marie/Documents/stuart_package/stuart/vignettes/stuart.Rmd="568884A2"
/mnt/zeus/zeus_mouselab/marie/map_after/map_after.R="6CD0FCD2"
/mnt/zeus/zeus_mouselab/marie/stuart_estmap/stuart_estmap.R="E9861925"
#' @title Exclude markers depending on estimated genetic map #' @title Exclude markers depending on estimated genetic map
#' #'
#' @description This function uses a "map" object producted by qtl::est.map() and returns markers which have a high recombination proportion with adjacent markers. #' @description This function uses a "map" object producted by qtl::est.map() and returns markers which have a high recombination proportion with adjacent markers.
#' @param tab data frame obtained with tab_mark function.
#' @param map object of class map #' @param map object of class map
#' @param dist a value to identify markers with odd recombination fractions. Default is 100. #' @param dist a value to identify markers with odd recombination fractions. Default is 100.
#' #'
...@@ -8,9 +9,8 @@ ...@@ -8,9 +9,8 @@
#' #'
#' @export #' @export
#' #'
#### mark_estmap #### #### mark_estmap ####
mark_estmap <- function(map,dist=100){ mark_estmap <- function(tab,map,dist=100){
#initialize variables #initialize variables
marker <- c() marker <- c()
chr <- c() chr <- c()
...@@ -44,5 +44,8 @@ mark_estmap <- function(map,dist=100){ ...@@ -44,5 +44,8 @@ mark_estmap <- function(map,dist=100){
T ~ 0)) T ~ 0))
bad_markers <- tab_map %>% filter(exclude == 1) %>% pull(marker) bad_markers <- tab_map %>% filter(exclude == 1) %>% pull(marker)
return(bad_markers)
tab <- tab %>% mutate(exclude_estmap = case_when(marker %in% bad_markers ~ 1,
T ~ 0))
return(tab)
} }
...@@ -44,6 +44,10 @@ write_rqtl <- function(geno,pheno,tab,ref,par1,par2,par_N=TRUE,prefix,pos,path=N ...@@ -44,6 +44,10 @@ write_rqtl <- function(geno,pheno,tab,ref,par1,par2,par_N=TRUE,prefix,pos,path=N
tab <- tab %>% filter(exclude_allele==0) tab <- tab %>% filter(exclude_allele==0)
} }
if("exclude_estmap" %in% colnames(tab)){
tab <- tab %>% filter(exclude_estmap==0)
}
#filter genotypes for non excluded markers in geno file #filter genotypes for non excluded markers in geno file
geno <- geno %>% select(c(marker,id,allele_1,allele_2)) %>% filter(marker %in% tab$marker) geno <- geno %>% select(c(marker,id,allele_1,allele_2)) %>% filter(marker %in% tab$marker)
......
...@@ -4,9 +4,11 @@ ...@@ -4,9 +4,11 @@
\alias{mark_estmap} \alias{mark_estmap}
\title{Exclude markers depending on estimated genetic map} \title{Exclude markers depending on estimated genetic map}
\usage{ \usage{
mark_estmap(map, dist = 100) mark_estmap(tab, map, dist = 100)
} }
\arguments{ \arguments{
\item{tab}{data frame obtained with tab_mark function.}
\item{map}{object of class map} \item{map}{object of class map}
\item{dist}{a value to identify markers with odd recombination fractions. Default is 100.} \item{dist}{a value to identify markers with odd recombination fractions. Default is 100.}
......
No preview for this file type
No preview for this file type
...@@ -197,39 +197,31 @@ plotMap(stuart_cross,stuart_newmap,shift=TRUE) ...@@ -197,39 +197,31 @@ plotMap(stuart_cross,stuart_newmap,shift=TRUE)
It seems to be still 2 problematic markers. We can identify them with the `mark_estmap()`function. It seems to be still 2 problematic markers. We can identify them with the `mark_estmap()`function.
```{r mark_estmap} ```{r mark_estmap}
bad_markers <- mark_estmap(map=stuart_newmap,dist=100) tab2 <- mark_estmap(tab=tab2,map=stuart_newmap,dist=100)
print(bad_markers) tab2 %>% filter(exclude_estmap == 1) %>% print.data.frame()
``` ```
### Exclusion of last problematic markers
Let's focus on these markers.
```{r bad_markers_tab}
tab2 %>% filter(marker %in% bad_markers)
```
We can see that these two markers have correct proportions of each genotypes so they were not excluded. Let's see the alleles of the parental strains for these markers. We can see that these two markers have correct proportions of each genotypes so they were not excluded. Let's see the alleles of the parental strains for these markers.
```{r bad_markers_annot} ```{r bad_markers_annot}
strains %>% filter(marker %in% bad_markers) strains %>% filter(marker %in% c("S6J010381992","S6J205609960"))
``` ```
We can see that for these markers, one of the two parents has a missing genotype. In the cases where one parent has a missing of a heterozygous genotype, the `mark_allele()` function does not by default exclude the marker if the observed allele of the genotyped parent correspond to one of the alleles observed in the crossed individuals. We can see that for these markers, one of the two parents has a missing genotype. In the cases where one parent has a missing of a heterozygous genotype, the `mark_allele()` function does not by default exclude the marker if the observed allele of the genotyped parent correspond to one of the alleles observed in the crossed individuals.
### Exclusion of last problematic markers
To avoid the issues with these situations, you can use the `parNH=FALSE` argument in the `mark_allele()` function. This will exclude all the markers with one parent with a missing or a heterozygous genotype. However, this will exclude a very large number of markers. More than 1000 are in this situation in this cross, but about 700 were excluded with other filters. If we exclude all these markers with `parNH=FALSE` we would lose about 300 markers for only 2 problematic ones. To avoid the issues with these situations, you can use the `parNH=FALSE` argument in the `mark_allele()` function. This will exclude all the markers with one parent with a missing or a heterozygous genotype. However, this will exclude a very large number of markers. More than 1000 are in this situation in this cross, but about 700 were excluded with other filters. If we exclude all these markers with `parNH=FALSE` we would lose about 300 markers for only 2 problematic ones.
We rather advise you to keep these markers and identify possible problematic markers with `mark_estmap()` and exlude them. You have two possibilities to do so. First, you can drop these markers in the cross object with `qtl::drop.markers()`: We rather advise you to keep these markers and identify possible problematic markers with `mark_estmap()` and exlude them. You have two possibilities to do so. First, you can drop these markers in the cross object with `qtl::drop.markers()`:
```{r drop_markers} ```{r drop_markers}
newcross <- drop.markers(cross=stuart_cross,markers=bad_markers) newcross <- drop.markers(cross=stuart_cross,markers=c("S6J010381992","S6J205609960"))
``` ```
Or if you prefere to have a correct CSV object that can always be used for QTL analysis, you can delete these markers in the marker tab and create a new cross file with `write_rqtl()`: Or if you prefere to have a correct CSV object that can always be used for QTL analysis, you can delete these markers in the marker tab and create a new cross file with `write_rqtl()`:
```{r} ```{r}
tab2 <- tab2 %>% filter(!marker %in% bad_markers)
good_rqtl_file <- write_rqtl(geno=genos,pheno=phenos,tab=tab2, good_rqtl_file <- write_rqtl(geno=genos,pheno=phenos,tab=tab2,
ref=strains,par1="parent1",par2="parent2",prefix="ind_",pos="cM_cox") ref=strains,par1="parent1",par2="parent2",prefix="ind_",pos="cM_cox")
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment