#' @title Create haplotype for a new mouse strain into a reference dataframe
#'
#' @description This functions adds columns for parental strains used in the cross in the annotation data frame, from the genotype data frame in which one or several animal of the parental strains were genotyped.
#' If several animals of one strain were genotyped, a consensus is created from these animals.
#' The consensus is created as follow : if the indivuals carry the same allele, this allele is kept, otherwise, the allele is noted as "N". If individuals show residual heterozygosity, it is encoded as "H".
#' @param ref data frame with the reference genotypes of mouse lines
#' @param geno data frame with the genotyping results for your cross from miniMUGA array
#' @param par1 first parental strain used in the cross, the name must be written as in the geno data frame
#' @param par2 second parental strain used in the cross, the name must be written as in the geno data frame
#' @param name1 name of the first parental strain to use as the column name in the ref data frame
#' @param name2 name of the second parental strain to use as the column name in the ref data frame
#' @title Exclude markers that have different alleles in the individuals of the cross and in parental strains
#'
#' @description This functions uses the dataframe produced by the tab_mark function and fills the "exclude" column for all the markers which have alleles observed in the individuals of the cross that do not correspond to the alleles observed in the parental strains. For example, a marker which is not polymorphic between the two parental strains but which has two alleles in the cross individuals will be excluded.
#' @param tab data frame obtained with tab_mark function
#' @param ref data frame with the reference genotypes of mouse lines
#' @param par1 first parental strain used in the cross, the name must be written as in the "ref" data frame
#' @param par2 second parental strain used in the cross, the name must be written as in the "ref" data frame
#'
#' @import dplyr
#'
#' @export
#'
mark_allele<-function(tab,ref,par1,par2){
#markers of ref df as characters
ref$marker<-as.character(ref$marker)
colnames(ref)<-make.names(colnames(ref))
#recode parents' names to match column names nomenclature
#' @title Exclude markers that were not genotyped in the reference strains
#'
#' @description This functions uses the dataframe produced by the tab_mark function and fills the "exclude" column for all the markers that were genotyped in the individuals of the cross but not in the reference strains. This is useful if the parental strains of the cross were not genotyped with the individuals and a previous genotyping result is used. Indeed, changes in the markers of the array may have happened. We recommend always using this function in order to avoid errors.
#' @param tab data frame obtained with tab_mark function
#' @param ref data frame with the reference genotypes of mouse lines
#'
#' @import dplyr
#'
#' @export
#'
mark_match<-function(tab,#tab_mark df
ref){#strain ref geno file
#finds SNPs that are in both files:
snp_strains<-as.character(ref$marker)#extracts SNPs in strains ref geno file
snp_genfile<-as.character(tab$SNP.Name)#extracts SNPs in cross geno file
#' @title Exclude markers that are not polymorphic
#'
#' @description This functions uses the dataframe produced by the tab_mark function and fills the "exclude" column for all the markers that are not polymorphic.
#' @param tab data frame obtained with tab_mark function
#' @title Exclude markers depending on proportions of homo/hetorozygous
#'
#' @description uses the dataframe produced by the tab_mark function and fills the "exclude" column for all the markers that present odd proportions of each genotype. You can define these proportions thanks to the arguments of the function.
#' @param tab data frame obtained with tab_mark function.
#' @param cross F2 or N2.
#' @param homo proportion of homozygous individuals under which the marker is excluded. Will apply on both homozygous genotypes for a F2, but only on one for N2.
#' @param hetero proportion of heterozygous individuals under which the marker is excluded.
#' @param na proportion of non-genotyped individuals above which the marker is excluded.
#'
#' @import dplyr
#'
#' @export
#'
#### mark_prop ####
## excludes markers depending on proportions of homo/hetorozygous
#' Data frame with miniMUGA genotyping of classical lab strains.
#'
#' A dataset containing the genotypes of 10 mouse strains of the Institut pasteur. Markers positions and other information are from by Karl Broman (https://kbroman.org/MUGAarrays/mini_revisited.html). Strains genotyped from Institut Pasteur.
#'
#' @format A data frame with 11299 rows and 18 variables
#' \describe{
#' \item{CC001}{CC001 mouse strain}
#' \item{CC005}{CC005 mouse strain}
#' \item{CC042}{CC042 mouse strain}
#' \item{CC071}{CC071 mouse strain}
#' \item{Ifnar.KO.129}{Ifnar KO 129 mouse strain}
#' \item{Ifnar.KO.B6}{Ifnar KO B6 mouse strain}
#' \item{Rvfs2.1}{Rvfs2-1 mouse strain}
#' \item{Rvfs2.2}{Rvfs2-2 mouse strain}
#' \item{Rvfs2.6}{Rvfs2-6 mouse strain}
#' \item{Rvfs2.7}{Rvfs2-7 mouse strain}
#' \item{marker}{name of the marker}
#' \item{chr}{chromosome}
#' \item{bp_mm10}{localisation on chromosome in bp (mm10 assembly)}
#' \item{cM_cox}{localisation on chromosome in cM (from Cox et al.)}
#' \item{cM_g2f1}{localisation on chromosome in cM (from Liu et al.)}
#' \item{snp}{marker alleles}
#' \item{unique}{indicates if the marker maps uniquely on mm10}
#' \item{multi}{indicates if the marker maps more than one time on mm10}
#' \item{unmapped}{indicates if the marker does not map perfectly on mm10}
#' @title Create of the summary table for all markers from the genotype data frame
#'
#' @description This function creates a table with all the markers that were genotyped in the array, the alleles for these markers, the number of homozygous and heterozygous animals, as well as the number of non genotyped animals.
#' @param geno data frame with the genotyping results for your cross
#' @description This function uses the table produced by tab_mark function filled by all the mark_* functions in order to create a data frame in the right format for Rqtl read.cross function. Only the non-excluded markers will be kept and genotypeds will be encoded in "0", "1" and "2", "0" being homozygous for the first parental strain, "1" heterozygous and "2" homozygous for the second parental strain. Caution, this file create a data frame and a CSV file in the path of your choice if indicated by the "path" argument. This function does not create a "cross" object in your environment that can be directly used for QTL mapping. You will need to load the CSV file with qtl::read.cross.
#' @param geno data frame with the genotyping results for your cross
#' @param pheno data frame with phenotypes of the individuals (individuals must have the same ID in the geno data frame and in the pheno data frame)
#' @param prefix potential prefix present in the names of the individuals in the geno data frame to be removed in ordere to have the same names as in the pheno file
#' @param tab data frame obtained with tab_mark function
#' @param ref data frame with the reference genotypes of mouse lines
#' @param par1 first parental strain used in the cross, the name must be written as in the "ref" data frame
#' @param par2 second parental strain used in the cross, the name must be written as in the "ref" data frame
#' @param method method of calculation of cM position, can be "cM_cox" of "cM_g2f1"
#' @param path if indicated, the data frame will be exported in this path
#'
#' @import dplyr
#' @import tidyr
#' @import utils
#' @import stringr
#'
#' @export
#'
#### write_rqtl ####
## write data frame in rqtl format (csv), if path != NA writes the file in the path indicated