diff --git a/README.md b/README.md index 33500ab9f104ac4336ea7c3e1705debdc01848cb..473fe56c974fad8f568f9adbcb6b0a0968e5c0ec 100644 --- a/README.md +++ b/README.md @@ -15,9 +15,9 @@ Analysis of GWAS data is indeed purely computational, and performed by scientist In this broader context, CC-QTL positions itself as a one-stop shop interface dedicated to QTL mapping on [Collaborative Cross](https://www.nature.com/articles/ng1104-1133) data, a widely used mouse mapping population that derives from 8 founder strains. -CC-QTL embarks both a user-friendly GUI allowing end-to-end QTL mapping analysis (from data transformation to exploration of the QTL interval, eg, identifying candidate genes) and a database structure guaranteeing the safe and organized storage of phenotypic data along with an advanced permissions system. +CC-QTL embarks both a user-friendly GUI allowing end-to-end QTL mapping analysis (from data transformation to exploration of the QTL interval, eg, identifying candidate genes) and a database structure guaranteeing the safe and organized storage of phenotypic data along with an advanced permissions system. -CC-QTL’s main goal is to allow non-specialists (mouse experimental geneticists on their own or publicly available data, trainers for demo or teaching purposes) to explore and analyze data by themselves. However, the added bonuses of Galaxy-powered analyses reproducibility and permission system makes CC-QTL also relevant for more experienced users, eg. facilities. +CC-QTL’s main goal is to allow non-specialists (mouse experimental geneticists on their own or publicly available data, trainers for demo or teaching purposes) to explore and analyze data by themselves. However, the added bonuses of Galaxy-powered analyses reproducibility and permission system makes CC-QTL also relevant for more experienced users, eg. facilities. CC-QTL is still under active development and its [documentation](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/tree/to-final-gitlab-orga?ref_type=heads#cc-qtl-db) is being updated regularly. It is thus strongly encouraged that you always use the latest release of CC-QTL. @@ -116,6 +116,7 @@ Go to [http://localhost:8889/](http://localhost:8889/) and login with the create ## Usage ### Foreword + CC-QTL embarks both a database ready to be populated and an analysis framework. The CC-QTL interface is built around the concept of project. This holds both in terms of permissions (you need to be a member of a project to view/modify objects attached to a project, more on permissions [here](là)) and in terms of data organisation. @@ -126,22 +127,23 @@ An example organisation is shown here.  -As a rule of thumb, each project can be considered as a "biological question" (involving several people, the "project users"), which is being addressed in several steps, each being an experiment. As such, to each experiment corresponds a given set of phenotypes which will be subjected to QTL mapping following its upload into the interface. For each experiment, it will then be possible to run analyses several times, tuning the parameters, eg by enlarging the QTL support interval (confidence interval around the peak), being more or less stringent on the statistical threshold, re-running it all on a new verion of the genome, etc. In line with the strong requirement for reproduciblity while doing such back and forth analyses, all this is being wrapped into Galaxy-powered workflow thus guaranteeing proper tracking of the parameters that got used. +As a rule of thumb, each project can be considered as a "biological question" (involving several people, the "project users"), which is being addressed in several steps, each being an experiment. As such, to each experiment corresponds a given set of phenotypes which will be subjected to QTL mapping following its upload into the interface. For each experiment, it will then be possible to run analyses several times, tuning the parameters, eg by enlarging the QTL support interval (confidence interval around the peak), being more or less stringent on the statistical threshold, re-running it all on a new verion of the genome, etc. In line with the strong requirement for reproduciblity while doing such back and forth analyses, all this is being wrapped into Galaxy-powered workflow thus guaranteeing proper tracking of the parameters that got used. To allow the user to take his or her first steps with CC-QTL, the [testdata](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/tree/to-final-gitlab-orga/testdata) folder contains an example of phenotypic data to be passed as input. The recommended way to proceed (described in more details below) is as follows: creating a project and a first experiment withing this project, uploading phenotype data, annotating the data, exploring the data, running the QTL analysis. ### Getting started: project creation -Upon deployment, the user arrives on a blank interface. To start with, a first project must be created, by going to "Projects" in the left panel menu, then "+", then filling the project name and description. It is possible to edit project name and description later on, by selecting the pen icon on the project tile. + +Upon deployment, the user arrives on a blank interface. To start with, a first project must be created, by going to "Projects" in the left panel menu, then "+", then filling the project name and description. It is possible to edit project name and description later on, by selecting the pen icon on the project tile.  The blue bubbles, empty on first deployment, document how many projects, experiments and analysis are existing in the interface, on the basis of what the user is allowed to see (see [permissions](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/tree/to-final-gitlab-orga#permission-systems)). - ### Creating an experiment and uploading phenotype data -Once a project has been created, it is needed to create a first experiment, as described in the following picture. + +Once a project has been created, it is needed to create a first experiment, as described in the following picture.  @@ -160,6 +162,7 @@ The headers of the four first fields (Individual_ID, Mother_Line, Father_Line, S - Mother_Line, Father_Line: "short" identifier of the CC line (eg, CCOO1 rather than CC001/Unc, /suffix referring to the institution where the line got developed). In the case of CC lines, dam and sire (Mother_Line and Father_Line, respectively) are by design expected to be identical, however both fields needs to be filled. Albeit seemingly daunting, this structure is critical to allow extension to other mapping populations (see [FAQ](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/tree/to-final-gitlab-orga/#can-i-use-ccqtl-on-another-mapping-population-)) + - Sex: should be provided as M or F. - Pheno1: A float, using dot and not commas for decimals. Phenotypic value for phenotype "Pheno1". @@ -167,26 +170,26 @@ Data can be provided either as mean values (one single value per line or line x Do note that "Pheno1" will be auto-detected as phenotype name upon upload in the interface. Phenotype naming should thus avoid spaces (to be substitued with dots, underscore, capitalization, etc) and special characters. Abbreviations or acronyms can be used, since the critical phenotypic informations (proper description of what exactly is being profiled) can/should be provided with controlled vocabulary, as described [here](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/tree/to-final-gitlab-orga/#annotating-phenotype-data-using-controlled-vocabulary). - Formatting requirements regarding phenotypic input data (input file format, use of means or individual values) are provided as help buttons in the interface. ### Annotating phenotype data using controlled vocabulary -To make full use of the database, it is strongly recommended that you annotate your phenotypes since the sole phenotype label can hardly capture exactly what got measured. +To make full use of the database, it is strongly recommended that you annotate your phenotypes since the sole phenotype label can hardly capture exactly what got measured. To do so, CC-QTL offers free text zone, preselected values (is it organ specific vs whole body measurements, are the phenotypic values floats or integers, continuous or categoricals) but also controlled vocubulary, namely, ontologies. The ontology terms associated to a given phenotype are to be provided in the "Phenotype category" menu of the interface. Selection of appropriate ontology is performed through an autocompletion field that uses the description of the ontology term (rather that the ontology term identifier, eg MP:xxxx) to propose matching terms to the user. As many terms as deemed relevant can be selected. -CC-QTL ships with a minimal ontology (1248 terms total), that corresponds to the different terms currently used for [phenotype annotation in the Mouse Phenome Database](https://phenome.jax.org/about/ontologies). It is a merge of the three main ontologies used in the field: Mouse Anatomy ontology (MA, subsetted to 153 terms), Mammalian Phenotype ontology (MP, subsetted to 585 terms) and Vertebrate Trait ontology (VT, subsetted to 510 terms). +CC-QTL ships with a minimal ontology (1248 terms total), that corresponds to the different terms currently used for [phenotype annotation in the Mouse Phenome Database](https://phenome.jax.org/about/ontologies). It is a merge of the three main ontologies used in the field: Mouse Anatomy ontology (MA, subsetted to 153 terms), Mammalian Phenotype ontology (MP, subsetted to 585 terms) and Vertebrate Trait ontology (VT, subsetted to 510 terms). -Along with the ontology terms, users can also make use of the free text zone which can be handy (although less searchable) for providing experimental details (eg, in the case of genetic resistance to a given pathogen, amount of pathogen injected to the mouse) in a more user-friendly (altough less searchable) manner. Overall, we recommend to use ontology terms as much as possible. +Along with the ontology terms, users can also make use of the free text zone which can be handy (although less searchable) for providing experimental details (eg, in the case of genetic resistance to a given pathogen, amount of pathogen injected to the mouse) in a more user-friendly (altough less searchable) manner. Overall, we recommend to use ontology terms as much as possible. ### Exploring the phenotypic data + Distribution of phenotypic values must be somewhat gaussian to satisfy the statistical hypothesis behind QTL mapping. To allow the user to verify easily distributions for each phenotype, CC-QTL embarks a phenotype value distribution plotting and transformation module. -To do so, user selects in the drop-down menu the phenotype of interest to be plotted, thus allowing to check easily if the distribution grosso modo fits a gaussian. Several common mathematical transformations can then be applied to look at whether this improves the pattern ; a help menu points to some indications on what would be the most appropriate transformation given the data. +To do so, user selects in the drop-down menu the phenotype of interest to be plotted, thus allowing to check easily if the distribution grosso modo fits a gaussian. Several common mathematical transformations can then be applied to look at whether this improves the pattern ; a help menu points to some indications on what would be the most appropriate transformation given the data. -If satisfactory, the user can then save the transformed data, eg to reload these as a new experiment file. This step is not yet embedded in the Galaxy workflow, users need to be cautious upon proper documentation of the mathematical transformations performed on the data. +If satisfactory, the user can then save the transformed data, eg to reload these as a new experiment file. This step is not yet embedded in the Galaxy workflow, users need to be cautious upon proper documentation of the mathematical transformations performed on the data.  @@ -197,6 +200,7 @@ Note that this plotting module allows to look at data distribution not only over  ### Running an analysis (QTL mapping) on the aforementioned experiment + Now that an experiment has been created and loaded with phenotypic data which distribution is compatible with QTL mapping, it is now possible to run an analysis, namely, a QTL mapping Galaxy worflow. One analysis corresponds to a given QTL mapping workflow, launched with specific parameters. @@ -206,27 +210,38 @@ To launch an analysis, the user must: create a new analysis in the corresponding  -QTL mapping and all related tasks are performed using a series of tools from the [R/qtl2 suite](https://kbroman.org/qtl2/index.html). At present, CC-QTL only handles QTL mapping on continuous phenotype data (p. opp. binary traits, discussed [here](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/tree/to-final-gitlab-orga?ref_type=heads#is-my-data-appropriate-for-qtl-mapping-with-cc-qtl-)) on CC mice (p. opp. others populations, see [here](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/tree/to-final-gitlab-orga?ref_type=heads#can-i-use-cc-qtl-on-another-mapping-population-)): there is as such only one Galaxy workflow, more will become available upon later releases. +QTL mapping and all related tasks are performed using a series of tools from the [R/qtl2 suite](https://kbroman.org/qtl2/index.html). At present, CC-QTL only handles QTL mapping on continuous phenotype data (p. opp. binary traits, discussed [here](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/tree/to-final-gitlab-orga?ref_type=heads#is-my-data-appropriate-for-qtl-mapping-with-cc-qtl-)) on CC mice (p. opp. others populations, see [here](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/tree/to-final-gitlab-orga?ref_type=heads#can-i-use-cc-qtl-on-another-mapping-population-)): there is as such only one Galaxy workflow, more will become available upon later releases. +The workflow consists of the following steps. Some steps require the user to specify some parameters, others don't. The user does not have the hand on parameters that have been tailored for the CC (although the values that were used appear in the Galaxy recaps). For parameters that can be modified at user's discretion, default value are indicated and most of the drop-down menus only allow a limited, appropriate number of choices. -The workflow consists of the following steps. Some steps require the user to specify some parameters, others don't. The user does not have the hand on parameters that have been tailored for the CC (although the values that were used appear in the Galaxy recaps). For parameters that can be modified at user's discretion, default value are indicated and most of the drop-down menus only allow a limited, appropriate number of choices. +#### **format_data** -- **format_data**: given the phenotypes passed as input and the selected parameters (set of genetic markers, etc), creation of all the input files needed for qtl2. +given the phenotypes passed as input and the selected parameters (set of genetic markers, etc), creation of all the input files needed for qtl2.  -- **compute_genoprobs**: computation of genotype probabilities (haplotype reconstruction: from the genotype information at genetic markers in all CCs, infer the most likely genotype between markers) and kinship matrice, to be used in the LMM. No parameters are to be selected by the user for this step. +#### **compute_genoprobs** + +computation of genotype probabilities (haplotype reconstruction: from the genotype information at genetic markers in all CCs, infer the most likely genotype between markers) and kinship matrice, to be used in the LMM. No parameters are to be selected by the user for this step.  -- **genome_scan**: QTL mapping per se, in the form of single-marker regression using a LMM to account for kinship: at each marker, fits a full (phenotype as function of genotype & kinship) vs a null model (phenotype as a function of kinship) and computes the LOD. No parameters are to be selected by the user for this step. +#### **genome_scan** + +QTL mapping per se, in the form of single-marker regression using a LMM to account for kinship: at each marker, fits a full (phenotype as function of genotype & kinship) vs a null model (phenotype as a function of kinship) and computes the LOD. No parameters are to be selected by the user for this step.  -- **permutations**: permutation based strategy to define the maximal LOD score expected at random. Note that the permuation strategy accomodates (through the kinship) for unequal number of individuals per lines. The user needs to specify how many permutations are to be performed. +#### **permutations** + +permutation based strategy to define the maximal LOD score expected at random. Note that the permuation strategy accomodates (through the kinship) for unequal number of individuals per lines. The user needs to specify how many permutations are to be performed.  -- **find_peaks**: given the LOD values and the threshold defined by permutation, identify which peaks are statistically supported QTL. For each QTL, a confidence interval (also called "QTL support interval", which is the region of the genome in which the QTL is likely to be found, since the bona fide QTL is seldom under the peak) is also produced, its size can be defined by the user using a drop off approach. The effects (how much of phenotypic heritable variation is explained by this QTL) and direction of effects (out of the 8 parental origin, which one is giving this phenotype) are also estimated. User needs to provide the statistical alpha, whether or not correction for multiple phenotypes considered at once should be performed and drop-off value. +#### **find_peaks** + +given the LOD values and the threshold defined by permutation, identify which peaks are statistically supported QTL. For each QTL, a confidence interval (also called "QTL support interval", which is the region of the genome in which the QTL is likely to be found, since the bona fide QTL is seldom under the peak) is also produced, its size can be defined by the user using a drop off approach. The effects (how much of phenotypic heritable variation is explained by this QTL) and direction of effects (out of the 8 parental origin, which one is giving this phenotype) are also estimated. User needs to provide the statistical alpha, whether or not correction for multiple phenotypes considered at once should be performed and drop-off value.  -- **refine_regions**: in the region surrounding the QTL peak, extract all genes annotations and perform local association study using DNA sequence variants that distinguish the founder strains. The objective of this step is to perform a first level characterization of the genomic regions that contains the QTL: extracting the genes and assessing which SNPs are most strongly associated with the phenotype. Combination of the two forms a first, easy to perfom, candidate gene selection. For this step, the user needs to provide the size of the region to be further analysed as well as sqlite files for mouse genome annotation and DNA sequence variants segregating among the 8 founder strains (namely, SNPs, indels and SV that are present/absent across the 8 founders). +#### **refine_regions** + +in the region surrounding the QTL peak, extract all genes annotations and perform local association study using DNA sequence variants that distinguish the founder strains. The objective of this step is to perform a first level characterization of the genomic regions that contains the QTL: extracting the genes and assessing which SNPs are most strongly associated with the phenotype. Combination of the two forms a first, easy to perfom, candidate gene selection. For this step, the user needs to provide the size of the region to be further analysed as well as sqlite files for mouse genome annotation and DNA sequence variants segregating among the 8 founder strains (namely, SNPs, indels and SV that are present/absent across the 8 founders).  Depending on the workflow step and the type of Galaxy (deployed locally on laptop or cluster-based), the workflow can take some time - status for each step (pending, running, ran, queueing) as well as of the whole analysis workflow is available in the Analysis menu on the interface. @@ -234,34 +249,35 @@ Depending on the workflow step and the type of Galaxy (deployed locally on lapto Similarly to what can be done on Galaxy, at the end of the analysis, it is possible to either make a copy of the analysis (eg to re-run it at such, using all same parameters but on a different experiment eg updated phenotyping experiment) or download the data files (as rds files) that were generated. #### Example results -Once an analysis has ran (green tick), it is possible to go and look at the results. + +Once an analysis has ran (green tick), it is possible to go and look at the results. The very first plot that is provided is the regular genome scan LOD score plot, with or without making the threshold visible on the plot. QTL support interval is provided as orange shading around the peak ; by hovering over the LOD profile the marker and LOD score at this position appears.  -Behind the LOD profile the list of peaks is provided as a table, with all revelant informations (upper and lower bound to the QTL interval, p-value). The table can, as all tables displayed in the interface, be downloaded by the user. *Note that on the illustration below, no significant peak was detected (note the high p-value) - in such cases, the peak with the highest LOD score along with a minimal support interval is outputeed.* +Behind the LOD profile the list of peaks is provided as a table, with all revelant informations (upper and lower bound to the QTL interval, p-value). The table can, as all tables displayed in the interface, be downloaded by the user. _Note that on the illustration below, no significant peak was detected (note the high p-value) - in such cases, the peak with the highest LOD score along with a minimal support interval is outputeed._ By clicking on each peak, the user can access the following informations, for that particular peak: -- **Founder effects in QTL interval**: effects follows the color code of the CC. - +- **Founder effects in QTL interval**: effects follows the color code of the CC. +  - **Phenotype as a function of genotype at peak marker**: effects follows the color code of the CC. Effect plots are easier to interpret along with the LOD score plot below. Here, being of blue (founder strain XX) and gray (founder strain YY) haplotypes at the QTL peak is associated with higher vs. lower phenotypic values for that trait. - +  - **SNP association in peak region**: local association study. The list of SNPs and their pvalue are provided in the table below the plot. - +  - **Phenotype as a function of genotype at top SNPs**: for the selected "top SNPs" (highest LOD scores following the local association study). Note the "NA" category in case the CC line genotype could not be measured at that position. - - -- **List of genes in QTL interval**: note this table is interactive and hyperlinks point to gene annotation databases. - +  +- **List of genes in QTL interval**: note this table is interactive and hyperlinks point to gene annotation databases. +  ## More advanced features ### Permission systems + Permission is thought at the interface level as well at the project level. - At the interface level, users can have regular user rights or be superusers. The latter are allowed to create/remove user accounts and have all rights on all projects. @@ -271,17 +287,19 @@ The permissions of the different users can be accessed as follows: - via the "Users" interface menu. It is accessible to all users of the interface and recaps who is user vs. superuser. - via the users icon for each project. It recaps who, for the given project, is manager vs maintainer. This is only accessible to the users' allowed to see this given project. -- for each user, a recap of his/hers permissions, both at the interface level (user/superuser) and at the project level (maintainer/manager for the different projects), is accessible on his "My account" menu. +- for each user, a recap of his/hers permissions, both at the interface level (user/superuser) and at the project level (maintainer/manager for the different projects), is accessible on his "My account" menu. ### Deleting data -Projects, or experiments within a project, can be deleted using the thrashbin icon. Deletion can only be performed by the project manager. It translates into permanent data loss and thus requires confirmation. This feature has been enabled to allow users to discover the interface (eg. by creating dummy projects without filling the database permanently). It is however strongly discouraged for real data as it conflicts with the needs of reproducibility. + +Projects, or experiments within a project, can be deleted using the thrashbin icon. Deletion can only be performed by the project manager. It translates into permanent data loss and thus requires confirmation. This feature has been enabled to allow users to discover the interface (eg. by creating dummy projects without filling the database permanently). It is however strongly discouraged for real data as it conflicts with the needs of reproducibility. ### Modifying/extending the ontologies used + CC-QTL ships with a minimal ontology described [earlier](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/tree/to-final-gitlab-orga/#annotating-phenotype-data-using-controlled-vocabulary). There exists, however, many other ontologies that can be used to annotate (mouse) phenotypic data, either the full Mouse Anatomy, Vertebrate Trait or Mammalian Phenotype ontologies, or others such as [MIQAS-TAB (Minimal Information for QTLs and Association Studies Tabular)](http://miqas.sourceforge.net/specification/MIQAS_TAB/MIQAS_TAB_specification.html), [MMO (Measurement Method Ontology)](http://rgd.mcw.edu/rgdweb/ontology/view.html?acc_id=MMO:0000000) or [CMO (Clinical Measurement Ontology)](http://rgd.mcw.edu/rgdweb/ontology/view.html?acc_id=CMO:0000000). -Should the user wish to modify the ontologies associated with CC-QTL (remove some terms, add a different one, etc), it is possible to do so by editing the [corresponding json](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/blob/to-final-gitlab-orga/server/api/fixtures/ontology.json). +Should the user wish to modify the ontologies associated with CC-QTL (remove some terms, add a different one, etc), it is possible to do so by editing the [corresponding json](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/blob/to-final-gitlab-orga/server/api/fixtures/ontology.json). <!-- ### Removing all the data @@ -291,16 +309,19 @@ docker compose down -v --rmi all ``` If you want to start a fresh new install of cc-qtl, you need to remove some volumes. ---> +--> + ## FAQ -### Is my data appropriate for QTL mapping with CC-QTL ? +### Is my data appropriate for QTL mapping with CC-QTL ? + Current statistical framework underlying CC-QTL uses linear regression for QTL mapping (LMM with kinship matrix, actually), which is appropriate fo continuous traits that exhibit a roughly gaussian distribution. To validate this assumption, it is highly advised that the user explores the distribution of phenotypic values and possibly proceeds with mathematical (log, inv, sqrt) transformation of the data. In that respect, a plotting/transformation module (one phenotype at a time) is proposed in CC-QTL. -Should the trait under scrutinity departs from such distribution, even after mathematical transformation of the data, QTL detection will be greatly impeded, possibly raising fallacious QTL peaks. -QTL mapping can be performed on non-gaussian data, treating the trait as binary (eg infection score > or < to a given score, instead of a continuum of score values) rather than continuous ; in which case it uses logistic rather than linear regression, not yet implemented in CC-QTL. -In the meantime, a tinkered way of going to QTL mapping could be to to remove phenotypic values that create strong outliers in the distribution, by re-uploading a "cleaned" input file. This will thus corresponds to a new experiment, to which will be attached its corresponding analyses, thus guaranteeing tracking of the changes that were made and analysis reproducibility. +Should the trait under scrutinity departs from such distribution, even after mathematical transformation of the data, QTL detection will be greatly impeded, possibly raising fallacious QTL peaks. +QTL mapping can be performed on non-gaussian data, treating the trait as binary (eg infection score > or < to a given score, instead of a continuum of score values) rather than continuous ; in which case it uses logistic rather than linear regression, not yet implemented in CC-QTL. +In the meantime, a tinkered way of going to QTL mapping could be to to remove phenotypic values that create strong outliers in the distribution, by re-uploading a "cleaned" input file. This will thus corresponds to a new experiment, to which will be attached its corresponding analyses, thus guaranteeing tracking of the changes that were made and analysis reproducibility. + +### Shall I provide individual phenotypic values or mean values at the line level ? -### Shall I provide individual phenotypic values or mean values at the line level ? One critical parameter to keep in mind to answer this question is the number of individuals phenotyped for each line. In case there are very heterogeneous number of individual across the different lines (eg, 3 individuals for CC001 and 12 for CC002), mean phenotypic values devised at the line level are not readily comparable since the degree of precision of the mean estimate are not the same (also called variance heterogeneity). To accomodate for this real-life scenario (which can be frequent in preliminary experiments), CC-QTL allows the user to perform the QTL mapping using individual phenotypic values rather that mean values, thus allowing to deal properly with unequal number of individual accross lines. This will be handled in the LMM and permutation strategy afterwards. @@ -308,13 +329,14 @@ To accomodate for this real-life scenario (which can be frequent in preliminary Note however it is always preferred that the phenotyping experiments involve balanced number of individuals across genotype categories and use adjusted mean values accounting for known covariates (cage effects, etc). In that case, the user can provide one value per line (taking or not sex into account, to be defined depending on the phenotype expectations). ### Can I use CC-QTL on another mapping population ? + CC-QTL has been designed in such a way it is tailored for so-called 8-way RIL at both the analytic (parameters, input files) and database levels. As such, CC-QTL could be used not only for QTL mapping in the Collaborative Cross, but also for others 8-way RIL mapping population such as MAGIC lines for instance. It will however require some tinkering for some parameters (eg tweaking the Sex attribute for dioicous plant species). The analysis parameters, however, are not compatible with two-way RIL (namely, a mapping population that derives only from two parents, such as the mouse BxD mapping population) or more generally speaking mapping populations deriving only from 2 parental strains. This builds from the fact QTL mapping in two-parents mapping population is less complex analysis-wise than in multiparent populations, thus more accessible to experimental geneticists. -One key feature of CC-QTL is that given its database structure, it is thought in such a way that phenotypic data can be acquired for multiple traits, over different experiments performed by different users, with the objective in the long run to assess traits correlations. That requires the mouse genotypes to be reproducible, which is the case for CC lines, but not for DO lines, another mouse multiparental mapping population deriving from the same 8 founder lines than the CC, in which by design each line has a unique genetic makeup which cannot be reproduced. +One key feature of CC-QTL is that given its database structure, it is thought in such a way that phenotypic data can be acquired for multiple traits, over different experiments performed by different users, with the objective in the long run to assess traits correlations. That requires the mouse genotypes to be reproducible, which is the case for CC lines, but not for DO lines, another mouse multiparental mapping population deriving from the same 8 founder lines than the CC, in which by design each line has a unique genetic makeup which cannot be reproduced. -Ongoing work on CC-QTL (medium term objective) is to extend it to so-called CC-RIX, which are intercrosses between CC lines. CC-RIX are genetically reproducible like CCs, yet benefit from the buffering effect of heterozygosity and allow to assess parental effects. +Ongoing work on CC-QTL (medium term objective) is to extend it to so-called CC-RIX, which are intercrosses between CC lines. CC-RIX are genetically reproducible like CCs, yet benefit from the buffering effect of heterozygosity and allow to assess parental effects. ### How can I install the analysis workflow on Galaxy ? @@ -331,6 +353,7 @@ Keep in mind, though, that by using CC-QTL directly through Galaxy, you will get Feel free to [write an issue](https://gitlab.pasteur.fr/cc-qtl/cc-qtl-db/-/issues) in case you could not find what you needed in the [FAQ](https://gitlab.pasteur.fr/-/ide/project/cc-qtl/cc-qtl-db/tree/to-final-gitlab-orga/-/README.md/#faq). ## About CC-QTL + CC-QTL is being developed at the [Bioinformatics and Biostatistics Hub](https://research.pasteur.fr/fr/team/bioinformatics-and-biostatistics-hub/) of Institut Pasteur. CC-QTL is licensed under [GPLv3](https://www.r-project.org/Licenses/GPL-3).