Commit a7316336 authored by Marie Bourdon's avatar Marie Bourdon
Browse files

creation of mark_estmap

parent 3b11ec6e
This diff is collapsed.
......@@ -2,18 +2,21 @@
/home/marie/Documents/stuart_package/stuart/.Rhistory="77548AEE"
/home/marie/Documents/stuart_package/stuart/.gitignore="05FA27A5"
/home/marie/Documents/stuart_package/stuart/DESCRIPTION="E286725B"
/home/marie/Documents/stuart_package/stuart/NAMESPACE="F97F4FA9"
/home/marie/Documents/stuart_package/stuart/R/geno_strains.R="DCEFD103"
/home/marie/Documents/stuart_package/stuart/R/genos-data.R="33820B87"
/home/marie/Documents/stuart_package/stuart/R/mark_allele.R="64630E28"
/home/marie/Documents/stuart_package/stuart/R/mark_estmap.R="51A23014"
/home/marie/Documents/stuart_package/stuart/R/mark_match.R="CBA3F514"
/home/marie/Documents/stuart_package/stuart/R/mark_poly.R="63E868BF"
/home/marie/Documents/stuart_package/stuart/R/mark_prop.R="19C6446D"
/home/marie/Documents/stuart_package/stuart/R/phenos-data.R="75BCF577"
/home/marie/Documents/stuart_package/stuart/R/stuart_cross-data.R="A0BD213E"
/home/marie/Documents/stuart_package/stuart/R/stuart_tab-data.R="9C18AF59"
/home/marie/Documents/stuart_package/stuart/R/tab_mark.R="60E85DC0"
/home/marie/Documents/stuart_package/stuart/R/write_rqtl.R="D25FAC55"
/home/marie/Documents/stuart_package/stuart/README.Rmd="C395B1B3"
/home/marie/Documents/stuart_package/stuart/README.md="8BBA9900"
/home/marie/Documents/stuart_package/stuart/man/genos.Rd="383A8DC0"
/home/marie/Documents/stuart_package/stuart/vignettes/stuaRt.Rmd="007031F6"
/home/marie/Documents/stuart_package/stuart/vignettes/stuart.Rmd="1F99EBA0"
/mnt/gaia/gaia_mouselab/Marie/Package_stuaRt/Article/Figures/Maestro/map_before_neogen/map_before_neogen.R="36D40A63"
......@@ -3,3 +3,4 @@ Meta
.Rproj.user
/doc/
/Meta/
.Rhistory
......@@ -2,6 +2,7 @@
export(geno_strains)
export(mark_allele)
export(mark_estmap)
export(mark_match)
export(mark_poly)
export(mark_prop)
......
#' @title Exclude markers depending on estimated genetic map
#'
#' @description This function uses a "map" object producted by qtl::est.map() and returns markers which have a high recombination proportion with adjacent markers.
#' @param map object of class map
#' @param dist a value to identify markers with odd recombination fractions. Default is 100.
#'
#' @import dplyr
#'
#' @export
#'
#### mark_estmap ####
mark_estmap <- function(map,dist=100){
#initialize variables
marker <- c()
chr <- c()
pos <- c()
place <- c()
previous <- c()
follow <- c()
#get information in newmap
for(i in names(map)){
marker <- c(marker,names(map[[i]]))
chr <- c(chr,rep(i,times=length(map[[i]])))
pos <- c(pos,unname(map[[i]]))
place <- c(place,"first",rep("middle",times=(length(map[[i]])-2)),"last")
prev <- c(NA,unname(map[[i]])[1:length(map[[i]])-1])
previous <- c(previous,prev)
fol <- c(unname(map[[i]])[2:length(map[[i]])],NA)
follow <- c(follow,fol)
}
tab_map <- tibble(marker = marker,
chr = chr,
place = place,
pos = pos,
previous = pos-previous,
follow = follow-pos)
tab_map <- tab_map %>% mutate(exclude=case_when(place == "first" & follow > dist ~ 1,
place == "middle" & previous > dist & follow > dist ~ 1,
place == "last" & previous > dist ~ 1,
T ~ 0))
bad_markers <- tab_map %>% filter(exclude == 1) %>% pull(marker)
return(bad_markers)
}
#' F2 cross
#'
#' A cross with 176 F2
#'
#' @format A large f2
"stuart_cross"
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mark_estmap.R
\name{mark_estmap}
\alias{mark_estmap}
\title{Exclude markers depending on estimated genetic map}
\usage{
mark_estmap(map, dist = 100)
}
\arguments{
\item{map}{object of class map}
\item{dist}{a value to identify markers with odd recombination fractions. Default is 100.}
}
\description{
This function uses a "map" object producted by qtl::est.map() and returns markers which have a high recombination proportion with adjacent markers.
}
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/stuart_cross-data.R
\docType{data}
\name{stuart_cross}
\alias{stuart_cross}
\title{F2 cross}
\format{
A large f2
}
\usage{
stuart_cross
}
\description{
A cross with 176 F2
}
\keyword{datasets}
This diff is collapsed.
No preview for this file type
No preview for this file type
......@@ -95,7 +95,12 @@ data(stuart_tab)
summary(stuart_tab)
```
Then we will use the different mark_* functions in order to filter the markers. First, we can use `mark_match()`` function. Here, the parental strains were genotyped with the F2 individuals, but it can happen that you use previous genotyping results for the parental strains. `mark_match()` function excludes markers that are in your genotype file but not in the reference genotype dataset. We recomend using this function as the chip used for genotyping may change.
Then we will use the different mark_* functions in order to filter the markers.
### mark_match
First, we can use `mark_match()`` function. Here, the parental strains were genotyped with the F2 individuals, but it can happen that you use previous genotyping results for the parental strains. `mark_match()` function excludes markers that are in your genotype file but not in the reference genotype dataset. We recomend using this function as the chip used for genotyping may change.
```{r mark_match}
tab2 <- mark_match(stuart_tab,ref=strains)
......@@ -104,6 +109,8 @@ tab2 %>% filter(exclude_match==1) %>% print.data.frame()
Here the reference strains were genotyped with the same version of the chip as the F2 individuals so no marker was excluded.
### mark_poly
Then, we can use the `mark_poly()` function, which will exclude the markers that are not polymorphic.
```{r mark_poly ex}
......@@ -111,12 +118,21 @@ tab2 <- mark_poly(tab2)
head(tab2) %>% print.data.frame()
```
The `mark_prop()` function can be used to filter markers depending on the proportion of each genotype. Here, we have a F2 so we can use the "homo" argument in order to filter depending on the proportion of both homozygous genotype. If we have a N2, we can filter with the proportion of homozygous individuals with the "homo" argument and of heterozygous individuals with the "hetero" argument. Moreover, this function allows to filter marker depending on the proportion on non genotyped animals. By defaults, markers for which more than 50% of individuals were not genotyped.
### mark_prop
The `mark_prop()` function can be used to filter markers depending on the proportion of each genotype. Here, we have a F2 and we use `homo=0.1, hetero=0.1` so the function will exclude all markers with less than 10% of each genotype. Moreover, this function allows to filter marker depending on the proportion on non genotyped animals. By defaults, markers for which more than 50% of individuals were not genotyped.
```{r mark_prop ex}
```{r mark_prop_ex_homo}
tab2 <- mark_prop(tab2,cross="F2",homo=0.1,hetero=0.1)
head(tab2) %>% print.data.frame()
```
We could also use the `pval` argument which allows to exclude markers by performing a Chi2 test comparing observed distribution with Mendelian proportions. By using `pval=0.5` we would exclude all markers with a Chi2 p-value inferior to 0.05. However, for some markers, Chi2 approximation may be incorrect.
```{r mark_prop_ex_pval,warning=FALSE}
mark_prop(tab2,cross="F2",pval=0.05) %>% head() %>% print.data.frame()
```
### mark_allele
Last, we can use the `mark_allele()` function. This very helpful function excludes markers for which the alleles found in the F2/N2 individuals do not correspond to the alleles found in the parental strains. For example, if for a marker is not polymorphic in the parental strains but we found two alleles in the F2/N2 individuals, it will be excluded.
......@@ -133,16 +149,83 @@ strains %>% filter(marker %in% c("gJAX00038569","gJAX00425031","gUNC12245354",
select(marker,parent1,parent2) %>% print.data.frame()
```
# Creation of the R/qtl file
## Creation of the R/qtl file
After excluding the problematic markers, we can create the R/qtl file. The individuals must have the same ID in the geno and in the pheno file. If there is a prefix in the geno file that must be removed in order to acheive this, you can use the "prefix" argument. The "path" argument can be used in order to create a CSV file that you can laod with `qtl::read.cross`.
```{r write_qtl}
rqtl_file <- write_rqtl(geno=genos,pheno=phenos,tab=tab2,
ref=strains,par1="parent1",par2="parent2",prefix="ind_",pos="cM_cox")
ref=strains,par1="parent1",par2="parent2",prefix="ind_",pos="cM_cox",path = "rqtl_file.csv")
rqtl_file[1:10,1:7] %>% print.data.frame()
```
## Check markers with est.map()
The best way to verify that the markers are correctly sorted is to calculate the estimated map with R/qtl. Indeed, if there is a problematic marker, it will present high recombination fraction with adjacent markers.
Here we can load the cross object that was created with the previous steps of the pipeline as well as the newmap that was produced with the function `est.map(cross=stuart_cross,error.prob=0.01)`.
```{r load_cross}
library(qtl)
data(stuart_cross)
summary(stuart_cross)
##### TO RUN #####
# newmap <- est.map(cross=cross,error.prob=0.01)
# save the map as data in the package
# data(stuart_newmap)
# summary(stuart_newmap)
```
Then we can plot the newmap to compare it with the positions found in the annotation file.
```{r plot_map}
# plotMap(stuart_cross,stuart_newmap,shift=TRUE)
```
### mark_estmap
It seems to be still 2 problematic markers. We can identify them with the `mark_estmap()`function.
```{r mark_estmap}
# bad_markers <- mark_estmap(map=newmap_after,dist=100)
# print(bad_markers)
```
### Exclusion of last problematic markers
Let's focus on these markers.
```{r bad_markers_tab}
tab2 %>% filter(marker %in% c("S6J010381992","S6J205609960"))
# tab2 %>% filter(marker %in% bad_markers)
```
We can see that these two markers have correct proportions of each genotypes so there not excluded. Let's see the alleles of the parental strains for these markers.
```{r bad_markers_annot}
strains %>% filter(marker %in% c("S6J010381992","S6J205609960"))
# strains %>% filter(marker %in% bad_markers)
```
We can see that for these markers, one of the two parents has a missing genotype. In the cases where one parent has a missing of a heterozygous genotype, the `mark_allele()` function does not by default exclude the marker if the observed allele of the genotyped parent correspond to one of the alleles observed in the crossed individuals.
To avoid the issues with these situations, you can use the `parNH=FALSE` argument in the `mark_allele()` function. This will exclude all the markers with one parent with a missing or a heterozygous genotype. However, this will exclude a very large number of markers. More than 1000 are in this situation in this cross, but about 700 were excluded with other filters. If we exclude all these markers with `parNH=FALSE` we would lose about 300 markers for only 2 problematic ones.
We rather advise you to keep these markers and identify possible problematic markers with `mark_estmap()` and exlude them. You have two possibilities to do so. First, you can drop these markers in the cross object with `qtl::drop.markers()`:
```{r drop_markers}
# newcross <- drop.markers(cross=stuart_cross,markers=bad_markers)
```
Or if you prefere to have a correct CSV object that can always be used for QTL analysis, you can delete these markers in the marker tab and create a new cross file with `write_rqtl()`:
```{r}
# tab2 <- tab2 %>% filter(!marker %in% bad_markers)
#
# good_rqtl_file <- write_rqtl(geno=genos,pheno=phenos,tab=tab2,
# ref=strains,par1="parent1",par2="parent2",prefix="ind_",pos="cM_cox",path = "rqtl_file.csv")
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment