Commit a425f3b3 authored by Marie Bourdon's avatar Marie Bourdon
Browse files

add tab data

parent 3f386707
No preview for this file type
......@@ -114,20 +114,35 @@ head(rqtl_file)
rqtl_file[10,10]
rqtl_file[1:10,1:10]
save.image()
library(dplyr)
library(stuart)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(dplyr)
urlfile <- "https://github.com/kbroman/MUGAarrays/blob/master/UWisc/mini_uwisc_v2.csv"
annot_mini <- read.csv(urlfile)
annot_mini <- read.csv(url(urlfile))
annot_mini <- read_csv(url(urlfile))
annot_mini <- read.csv(url("https://github.com/kbroman/MUGAarrays/blob/master/UWisc/mini_uwisc_v2.csv"))
View(annot_mini)
annot_mini <- read.csv(url(urlfile))
View(annot_mini)
annot_mini <- read.csv(url("https://github.com/kbroman/MUGAarrays/blob/master/UWisc/mini_uwisc_v2.csv"))
View(annot_mini)
library(stuart)
annot_mini <- read.csv(url("https://raw.githubusercontent.com/kbroman/MUGAarrays/master/UWisc/mini_uwisc_v2.csv"))
View(annot_mini)
rm(urlfile)
View(annot_mini)
View(annot_mini)
View(tab)
tab %>% select(-exclude)
tab %<>% select(-exclude)
tab <- tab %>% select(-exclude)
usethis::use_data(tab)
View(tab)
library(dplyr)
library(stuart)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(dplyr)
library(stuart)
data(genos)
data(genos)
summary(genos)
data(phenos)
summary(phenos)
strains <- geno_strains(ref=annot_mini,geno=genos,par1=c("StrainsA_1","StrainsA_2"),par2=c("StrainsB_1","StrainsB_2"),name1="parent1",name2="parent2")
head(strains)
genos <- genos %>% filter(!Sample.ID %in% c("StrainsA_1", "StrainsA_2", "StrainsB_1","StrainsB_2"))
data(stuart_tab)
#' @title Exclude markers depending on proportions of homo/hetorozygous
#'
#' @description This functions uses the dataframe produced by the tab_mark function and fills the "exclude" column for all the markers that present odd proportions of each genotype. You can define these proportions thanks to the arguments of the function.
#' @param tab data frame obtained with tab_mark function
#' @param cross F2 or N2. If F2, markers are excluded according to the proportion of each homozygous genotype (see "homo" argument). If N2, markers are excluded according to the proportion of heterogygous and homozygous (see "homo" and "hetero" argument)
#' @param homo proportion of homozygous individuals under which the marker is excluded. Will apply on both homozygous genotypes for a F2, but only on one for N2
#' @param hetero F2 or N2. Proportion of heterozygous individuals under which the marker is excluded
#' @param na proportion of non-genotyped individuals under which the marker is excluded
#' @description uses the dataframe produced by the tab_mark function and fills the "exclude" column for all the markers that present odd proportions of each genotype. You can define these proportions thanks to the arguments of the function.
#' @param tab data frame obtained with tab_mark function.
#' @param cross F2 or N2.
#' @param homo proportion of homozygous individuals under which the marker is excluded. Will apply on both homozygous genotypes for a F2, but only on one for N2.
#' @param hetero proportion of heterozygous individuals under which the marker is excluded.
#' @param na proportion of non-genotyped individuals above which the marker is excluded.
#'
#' @import dplyr
#'
......
#' Output of tab_mark function
#'
#' A dataset with the output of tab_mark() function.
#'
#' @format A data frame with 11125 rows and 7 variables
#' \describe{
#' \item{SNP.Name}{name of the marker}
#' \item{Allele_1}{first allele of the marker}
#' \item{Allele_2}{second allele of the marker}
#' \item{n_HM1}{number of homozygous individuals for the first allele}
#' \item{n_HM2}{number of homozygous individuals for the second allele}
#' \item{n_HT}{number of heterozygous individuals}
#' \item{n_NA}{number of non genotyped individuals}
#' }
"stuart_tab"
......@@ -7,16 +7,16 @@
mark_prop(tab, cross, homo = NA, hetero = NA, na = 0.5)
}
\arguments{
\item{tab}{data frame obtained with tab_mark function}
\item{tab}{data frame obtained with tab_mark function.}
\item{cross}{F2 or N2. If F2, markers are excluded according to the proportion of each homozygous genotype (see "homo" argument). If N2, markers are excluded according to the proportion of heterogygous and homozygous (see "homo" and "hetero" argument)}
\item{cross}{F2 or N2.}
\item{homo}{proportion of homozygous individuals under which the marker is excluded. Will apply on both homozygous genotypes for a F2, but only on one for N2}
\item{homo}{proportion of homozygous individuals under which the marker is excluded. Will apply on both homozygous genotypes for a F2, but only on one for N2.}
\item{hetero}{F2 or N2. Proportion of heterozygous individuals under which the marker is excluded}
\item{hetero}{proportion of heterozygous individuals under which the marker is excluded.}
\item{na}{proportion of non-genotyped individuals under which the marker is excluded}
\item{na}{proportion of non-genotyped individuals above which the marker is excluded.}
}
\description{
This functions uses the dataframe produced by the tab_mark function and fills the "exclude" column for all the markers that present odd proportions of each genotype. You can define these proportions thanks to the arguments of the function.
uses the dataframe produced by the tab_mark function and fills the "exclude" column for all the markers that present odd proportions of each genotype. You can define these proportions thanks to the arguments of the function.
}
Version: 1.0
RestoreWorkspace: No
SaveWorkspace: No
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8
RnwWeave: Sweave
LaTeX: pdfLaTeX
AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
LineEndingConversion: Posix
BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace
Version: 1.0
RestoreWorkspace: No
SaveWorkspace: No
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8
RnwWeave: Sweave
LaTeX: pdfLaTeX
AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
LineEndingConversion: Posix
BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace
......@@ -27,8 +27,8 @@ The examples shown here require the use of dplyr package.
```{r setup}
library(stuart)
library(dplyr)
library(stuart)
```
......@@ -38,7 +38,7 @@ The developer of Rqtl and Rqtl2 packages, Karl Broman, realised that the annotat
We recommand to use these annotation files to reconstruct the file use for Rqtl analysis. You can load the datasets with these annotations from GitHub (https://github.com/kbroman/MUGAarrays/tree/master/UWisc). Choose the file corresponding to the MUGA array that you used and use the URL to load the dataset in R.
Here, we will present an example of the use of stuaRt with results of a F2 cross genotyped with miniMUGA. We load the annotation file for miniMUGA: `annot_mini`, the result of Neogen genotyping: `genos` and thephenotype dataset produced by the lab: `phenos`. All these datasets are available for example in stuaRt package.
Here, we will present an example of the use of stuaRt with results of a F2 cross genotyped with miniMUGA. We load the result of Neogen genotyping: `genos` and thephenotype dataset produced by the lab: `phenos`. All these datasets are available for example in stuaRt package.
```{r annot}
annot_mini <- read.csv(url("https://raw.githubusercontent.com/kbroman/MUGAarrays/master/UWisc/mini_uwisc_v2.csv"))
......@@ -52,6 +52,7 @@ summary(genos)
data(phenos)
summary(phenos)
```
### Genotyping of parental strains
To use genotyping result for Rqtl analysis, we need to recode the genotypes of the individuals (originally encoded in A, T, G, C) depending on the genotype of the parental strains: homozygous for the first parental strain (0), heterozygous (1) or homozygous for the second parental strain (2).
......@@ -75,18 +76,18 @@ genos <- genos %>% filter(!Sample.ID %in% c("StrainsA_1", "StrainsA_2", "Strains
### Marker tab
The first step of the markers sorting is to create the marker dataframe with the tab_mark() function. This dataframe contains for each marker the two alleles that can be found in the F2/N2 population (`Allele_1` and `Allele_2`), the number of individuals for each genotype (homozygous for each allele (`n_HM1` and `n_HM2`) and heterozygous (`n_HT`)), and the number of non genotyped individuals (`n_NA`) This step can take several minutes.
The first step of the markers sorting is to create the marker dataframe with the tab_mark() function. This dataframe contains for each marker the two alleles that can be found in the F2/N2 population (`Allele_1` and `Allele_2`), the number of individuals for each genotype (homozygous for each allele (`n_HM1` and `n_HM2`) and heterozygous (`n_HT`)), and the number of non genotyped individuals (`n_NA`) This step can take several minutes. You can also load the output of this function.
```{r tab_mark}
tab <- tab_mark(geno=genos)
head(tab)
data(stuart_tab)
summary(stuart_tab)
```
Then we will use the different mark_* functions in order to filter the markers. First, we can use mark_match() function. This function excludes markers that are in your genotype file but not in the reference genotype dataset. We recomend using this function as the chip used for genotyping may change.
```{r mark_match}
tab2 <- mark_match(tab,ref=strains)
tab2 <- mark_match(stuart_tab,ref=strains)
tab2 %>% filter(exclude_match==1)
......
No preview for this file type
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment