Commit 404f913e authored by Marie Bourdon's avatar Marie Bourdon
Browse files

change tab_mark and stuart_tab

parent df999311
/Users/mariebourdon/Documents/PhD/stuart_package/stuart/DESCRIPTION="CE03CA27"
/Users/mariebourdon/Documents/PhD/stuart_package/stuart/R/geno_strains.R="7765CB4E"
/Users/mariebourdon/Documents/PhD/stuart_package/stuart/R/mark_allele.R="8251F7A9"
/Users/mariebourdon/Documents/PhD/stuart_package/stuart/R/mark_match.R="21CF8B2B"
/Users/mariebourdon/Documents/PhD/stuart_package/stuart/R/mark_poly.R="7D22AD56"
/Users/mariebourdon/Documents/PhD/stuart_package/stuart/R/mark_prop.R="567DA991"
/Users/mariebourdon/Documents/PhD/stuart_package/stuart/R/stuart_tab-data.R="95181F86"
/Users/mariebourdon/Documents/PhD/stuart_package/stuart/R/tab_mark.R="905F7322"
/Users/mariebourdon/Documents/PhD/stuart_package/stuart/R/write_rqtl.R="96B413EC"
/Users/mariebourdon/Documents/PhD/stuart_package/stuart/vignettes/stuaRt.Rmd="541E35A1"
/home/marie/Documents/stuart_package/stuart/vignettes/stuart.Rmd="568884A2"
#' @title Create haplotype for a new mouse strain into a reference dataframe
#' @title Create haplotype for a new mouse strain into a dataframe
#'
#' @description This functions adds columns for parental strains used in the cross in the annotation data frame, from the genotype data frame in which one or several animal of the parental strains were genotyped.
#' If several animals of one strain were genotyped, a consensus is created from these animals.
......
......@@ -52,9 +52,9 @@ mark_prop <- function(tab,cross,homo=NA,hetero=NA,pval=NA,na=0.5){
tab <- tab %>%
mutate(exclude_prop=case_when(p_NA > na ~ 1,
cross=="F2" & (p_HM1 < homo | p_HM2 < homo | p_HT < hetero) ~ 1,
cross=="N2" & ((p_HM2 == 0 & p_HM1 < homo) |
mutate(exclude_prop=case_when(!chr %in% c("X","Y","M") & p_NA > na ~ 1,
!chr %in% c("X","Y","M") & cross=="F2" & (p_HM1 < homo | p_HM2 < homo | p_HT < hetero) ~ 1,
!chr %in% c("X","Y","M") & cross=="N2" & ((p_HM2 == 0 & p_HM1 < homo) |
(p_HM1 == 0 & p_HM2 < homo) |
(p_HT < hetero) |
(p_HM1 !=0 & p_HM2 != 0)) ~ 1,
......
......@@ -2,9 +2,11 @@
#'
#' A dataset with the output of tab_mark() function.
#'
#' @format A data frame with 11125 rows and 7 variables
#' @format A data frame with 11125 rows and 9 variables
#' \describe{
#' \item{marker}{name of the marker}
#' \item{chr}{chromosome of the marker}
#' \item{cM_cox}{position of the marker}
#' \item{allele_1}{first allele of the marker}
#' \item{allele_2}{second allele of the marker}
#' \item{n_HM1}{number of homozygous individuals for the first allele}
......
Version: 1.0
RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8
RnwWeave: Sweave
LaTeX: pdfLaTeX
---
title: "article_figures.Rmd"
author: "Marie Bourdon"
date: "July 2021"
output: html_document
---
## Goal and raw data
The goal of this script is to produce figure for the stuart package manuscript.
This scripts uses data from the package, and other files found in the files directory.
This script uses the qtl_plot() ggplot based function to plot QTL mapping results. This function is in the script "QTL_plot.R".
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(magrittr)
library(qtl)
library(cowplot)
library(grid)
library(gridExtra)
library(gridGraphics)
library(stuart)
source("files/QTL_plot.R")
```
## Data load and use of stuart functions
### genos dataframe
Data frame from stuart with genotypes of 176 F2 individuals and parental strains.
```{r genos}
data(genos)
summary(genos)
```
### phenos dataframe
Data frame from stuart with phenotypes of 176 F2 individuals for a quantitative trait.
```{r phenos}
data(phenos)
summary(phenos)
```
```{r annot}
annot_mini <- read.csv(url("https://raw.githubusercontent.com/kbroman/MUGAarrays/master/UWisc/mini_uwisc_v2.csv"))
```
```{r strains}
#our genotypes
strains <- geno_strains(ref=annot_mini,geno=genos,par1=c("StrainsA_1","StrainsA_2"),par2=c("StrainsB_1","StrainsB_2"),name1="parent1",name2="parent2")
genos %<>% filter(!Sample.ID %in% c("StrainsA_1","StrainsA_2","StrainsB_1","StrainsB_2"))
#load parental strains data from Neogen
strns_neogen <- read.csv("git/STRNS.CSV")
# join with annotation file from miniMUGA
strns_neogen <- strns_neogen %>% select(name, parent1,parent2) %>% right_join(annot_mini,.,by=c("marker"="name"))
strns_neogen %<>% mutate(parent1=str_to_upper(parent1))
strns_neogen %<>% mutate(parent2=str_to_upper(parent2))
```
library(ggplot2)
library(ggrepel)
library(tidyr)
library(dplyr)
library(grid)
#Modified from https://shiring.github.io/ggplot2/2017/02/12/qtl_plots
qtl_plot <- function(input, # data frame input from scanone
mult.pheno = FALSE, # multiple phenotypes?
model = "normal", # model used in scanone
chrs = NA, # chromosomes to display
lod = NA, # LOD threshold
rug = FALSE, # plot marker positions as rug?
ncol = NA, # number of columns for facetting
labels = NA, # optional dataframe to plot QTL labels
ylab = "LOD", #optional name for y axis
title = "", #optional name for the graph
x.label = element_text(), #optional: use element_blank to remove labels
size = 1,
int = NA #optional confidence interval df: 3 columns chr, pos and lod with 3 lines: first limit, peak and second limit
) {
# if we have multiple phenotypes and/or a 2part model, gather input
if (mult.pheno & model == "2part") {
input <- gather(input, group, lod, grep("pheno", colnames(input)))
} else if (mult.pheno) {
input <- gather(input, group, lod, grep("pheno", colnames(input)))
} else if (model == "2part") {
input <- gather(input, method, lod, lod.p.mu:lod.mu)
}
# if not all chromosomes should be displayed, subset input
if (!is.na(chrs)[1]) {
input <- input[as.character(input$chr) %in% chrs, ]
}
# if there is more than one LOD column, gather input
if (!any(colnames(input) == "lod")) {
input$lod <- input[, grep("lod", colnames(input))]
}
# if no number of columns for facetting is defined, plot all in one row
if (is.na(ncol)) {
ncol <- length(unique(input$chr))
}
# if labels are set and there is no name column, set from rownames
if (!is.na(labels)[1]) {
if (is.null(labels$name)) {
labels$name <- rownames(labels)
}
}
# plot input data frame position and LOD score
plot <- ggplot(input, aes(x = pos, y = lod)) + {
# if LOD threshold is given, plot as horizontal line
if (!is.na(lod)[1] & length(lod) == 1) geom_hline(yintercept = lod, linetype = "solid")
} + {
if (!is.na(lod)[1] & length(lod) > 1) geom_hline(data = lod, aes(yintercept = lod, linetype = group))
} + {
# plot rug on bottom, if TRUE
if (rug) geom_rug(size = 0.1, sides = "b")
} + {
# if input has column method but not group, plot line and color by method
if (!is.null(input$method) & is.null(input$group)) geom_line(aes(color = method), size = size, alpha = 0.6)
} + {
# if input has column group but not method, plot line and color by group
if (!is.null(input$group) & is.null(input$method)) geom_line(aes(color = group), size = size, alpha = 0.6)
} + {
# if input has columns method and group, plot line and color by method & linetype by group
if (!is.null(input$group) & !is.null(input$method)) geom_line(aes(color = method, linetype = group), size = size, alpha = 0.6)
} + {
# set linetype, if input has columns method and group
if (!is.null(input$group) & !is.null(input$method)) scale_linetype_manual(values = c("solid", "twodash", "dotted"))
} + {
# if input has neither columns method nor group, plot black line
if (is.null(input$group) & is.null(input$method)) geom_line(size = size, alpha = 0.6)
} + {
# if QTL positions are given in labels df, plot as point...
if (!is.na(labels)[1]) geom_point(data = labels, aes(x = pos, y = lod))
} + {
# ... and plot name as text with ggrepel to avoid overlapping
if (!is.na(labels)[1]) geom_text_repel(data = labels, aes(x = pos, y = lod, label = name), nudge_y = 0.5)
} +
# facet by chromosome
facet_wrap(~ chr, ncol = ncol, scales = "free_x") +
# minimal plotting theme
theme_minimal() +
# increase strip title size
theme(strip.text = element_text(face = "bold", size = 12),
plot.title = element_text(hjust = 0.5, size = 12, face = "bold"),
axis.text.x = x.label,
panel.spacing = unit(0, "lines"), #space between chromosomes graphs
panel.border = element_rect(colour = "#d9d9d9", fill=NA, size=0.4, linetype = "solid"),
panel.grid = element_blank()) +
# use RcolorBrewer palette
scale_color_brewer(palette = "Set1") +
# Change plot labels
labs(x = "Chromosome",
y = ylab,
color = "",
linetype = "") +
ggtitle(title)
if(is.data.frame(int)==TRUE){
if(!is.na(chrs)[1] & length(chrs)==1){
#add interval for peak
plot <- plot + geom_segment(x=int[1,2],y=min(input[,3]),xend=int[1,2],yend=int[1,3],linetype="dashed",color="darkblue") +
geom_segment(x=int[3,2],y=min(input[,3]),xend=int[3,2],yend=int[3,3],linetype="dashed",color="darkblue") +
geom_segment(x=int[2,2],y=min(input[,3]),xend=int[2,2],yend=int[2,3],color="darkblue")
} else {
stop("only one chromosome must be displayed to plot confidence interval")
}
}
return(plot)
}
This diff is collapsed.
No preview for this file type
......@@ -2,12 +2,12 @@
% Please edit documentation in R/geno_strains.R
\name{geno_strains}
\alias{geno_strains}
\title{Create haplotype for a new mouse strain into a reference dataframe}
\title{Create haplotype for a new mouse strain into a dataframe}
\usage{
geno_strains(ref, geno, par1, par2, name1, name2)
}
\arguments{
\item{ref}{data frame with the reference genotypes of mouse lines}
\item{ref}{data frame with the annotation of the arrray used}
\item{geno}{data frame with the genotyping results for your cross from miniMUGA array}
......
......@@ -5,9 +5,11 @@
\alias{stuart_tab}
\title{Output of tab_mark function}
\format{
A data frame with 11125 rows and 7 variables
A data frame with 11125 rows and 9 variables
\describe{
\item{marker}{name of the marker}
\item{chr}{chromosome of the marker}
\item{cM_cox}{position of the marker}
\item{allele_1}{first allele of the marker}
\item{allele_2}{second allele of the marker}
\item{n_HM1}{number of homozygous individuals for the first allele}
......
......@@ -4,10 +4,12 @@
\alias{tab_mark}
\title{Create of the summary table for all markers from the genotype data frame}
\usage{
tab_mark(geno)
tab_mark(geno, annot, pos)
}
\arguments{
\item{geno}{data frame with the genotyping results for your cross}
\item{ref}{data frame with the annotation of the arrray used}
}
\description{
This function creates a table with all the markers that were genotyped in the array, the alleles for these markers, the number of homozygous and heterozygous animals, as well as the number of non genotyped animals.
......
No preview for this file type
No preview for this file type
......@@ -87,7 +87,7 @@ genos <- genos %>% filter(!Sample.ID %in% c("StrainsA_1", "StrainsA_2",
### Marker tab
The first step of the markers sorting is to create the marker dataframe with the tab_mark() function. This dataframe contains for each marker the two alleles that can be found in the F2/N2 population (`Allele_1` and `Allele_2`), the number of individuals for each genotype (homozygous for each allele (`n_HM1` and `n_HM2`) and heterozygous (`n_HT`)), and the number of non genotyped individuals (`n_NA`) This step can take several minutes, so you can also load the example output of this function.
The first step of the markers sorting is to create the marker dataframe with the tab_mark() function. This dataframe contains for each marker its chromosome (`chr`) and position (`cM_cox` was chosen from annot_mini), the two alleles that can be found in the F2/N2 population (`Allele_1` and `Allele_2`), the number of individuals for each genotype (homozygous for each allele (`n_HM1` and `n_HM2`) and heterozygous (`n_HT`)), and the number of non genotyped individuals (`n_NA`) This step can take several minutes, so you can also load the example output of this function.
```{r tab_mark}
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment