Skip to content
Snippets Groups Projects
Commit 9e7af411 authored by David  BIKARD's avatar David BIKARD
Browse files

Merge branch 'dev' into 'master'

PyPI publish package

See merge request !2
parents 475f4319 3e6a8eab
No related branches found
No related tags found
1 merge request!2PyPI publish package
Pipeline #22644 passed
.vscode
*.code-workspace
*.pyc
dist
*.egg-info
__pycache__
image: python:3.7
stages:
- lint
- test
- deploy
before_script:
- pip install poetry
- poetry install -v
lint:
before_script:
- pip install flake8
script:
- flake8 --max-line-length=88 crisprbact
test:
stage: test
script:
- poetry run pytest
deploy:
stage: deploy
script:
- poetry config http-basic.pypi $PYPI_USERNAME $PYPI_PWD
- poetry build -v
- poetry publish
rules:
- if: '$CI_COMMIT_TAG =~ /^\d+\.\d+\.\d+$/'
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/ambv/black
rev: stable
hooks:
- id: black
exclude: .hooks
language_version: python3.7
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.4.0
hooks:
- id: flake8
args: [--max-line-length=89]
# CRISPRbact
Tools to design and analyse CRISPRi experiments
\ No newline at end of file
**Tools to design and analyse CRISPRi experiments in bacteria.**
CRISPRbact currently contains an on-target activity prediction tool for the Streptococcus pyogenes dCas9 protein.
This tool takes as an input the sequence of a gene of interest and returns a list of possible target sequences with the predicted on-target activity. Predictions are made using a linear model fitted on data from a genome-wide CRISPRi screen performed in E. coli (Cui et al. Nature Communications, 2018). The model predicts the ability of dCas9 to block the RNA polymerase when targeting the non-template strand (i.e. the coding strand) of a target gene.
## Getting Started
### Installation
For the moment, you can install this package only via PyPI
#### PyPI
```console
$ pip install crisprbact
$ crisprbact --help
```
```
Usage: crisprbact [OPTIONS] COMMAND [ARGS]...
Options:
-v, --verbose
--help Show this message and exit.
Commands:
predict
```
### API
Using this library in your python code.
```python
from crisprbact import on_target_predict
guide_rnas = on_target_predict("ACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTTTGTGATTGTTTGCCGCGCGCGTGGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATATTGATGAAG")
for guide_rna in guide_rnas:
print(guide_rna)
```
_output :_
```
{'target': 'TCATCACGGGCCTTCGCCACGCGCG', 'guide': 'TCATCACGGGCCTTCGCCAC', 'start': 82, 'stop': 102, 'pam': 80, 'ori': '-', 'pred': -0.4719254873780802}
{'target': 'CATCACGGGCCTTCGCCACGCGCGC', 'guide': 'CATCACGGGCCTTCGCCACG', 'start': 81, 'stop': 101, 'pam': 79, 'ori': '-', 'pred': 1.0491308060379676}
{'target': 'CGCGCGCGGCAAACAATCACAAACA', 'guide': 'CGCGCGCGGCAAACAATCAC', 'start': 63, 'stop': 83, 'pam': 61, 'ori': '-', 'pred': -0.9021152826078697}
{'target': 'CCTGATCGGTATTGAACAGCATCTG', 'guide': 'CCTGATCGGTATTGAACAGC', 'start': 29, 'stop': 49, 'pam': 27, 'ori': '-', 'pred': 0.23853258873311955}
```
### Command line interface
#### Predict guide RNAs activity
Input the sequence of a target gene and this script will output candidate guide RNAs for the S. pyogenes dCas9 with predicted on-target activity.
```console
$ crisprbact predict --help
```
```
Usage: crisprbact predict [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
from-seq Outputs candidate guide RNAs for the S.
from-str Outputs candidate guide RNAs for the S.
```
##### From a string sequence:
The target input sequence can be a simple string.
```console
$ crisprbact predict from-str --help
```
```
Usage: crisprbact predict from-str [OPTIONS] [OUTPUT_FILE]
Outputs candidate guide RNAs for the S. pyogenes dCas9 with predicted on-
target activity from a target gene.
[OUTPUT_FILE] file where the candidate guide RNAs are saved. Default =
"stdout"
Options:
-t, --target TEXT [required]
--help Show this message and exit.
```
```console
$ crisprbact predict from-str -t ACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTTTGTGATTGTTTGCCGCGCGCGTGGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATATTGATGAAG guide-rnas.tsv
```
output file `guide-rnas.tsv` :
No `seq_id` is defined since it is from a simple string.
```
target PAM position prediction seq_id
TCATCACGGGCCTTCGCCACGCGCG 80 -0.4719254873780802 N/A
CATCACGGGCCTTCGCCACGCGCGC 79 1.0491308060379676 N/A
CGCGCGCGGCAAACAATCACAAACA 61 -0.9021152826078697 N/A
CCTGATCGGTATTGAACAGCATCTG 27 0.23853258873311955 N/A
```
You can also pipe the results :
```console
$ crisprbact predict from-str -t ACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTTTGTGATTGTTTGCCGCGCGCGTGGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATATTGATGAAG | tail -n +2 | wc -l
```
##### From a sequence file
```console
$ crisprbact predict from-seq --help
```
```
Usage: crisprbact predict from-seq [OPTIONS] [OUTPUT_FILE]
Outputs candidate guide RNAs for the S. pyogenes dCas9 with predicted on-
target activity from a target gene.
[OUTPUT_FILE] file where the candidate guide RNAs are saved. Default =
"stdout"
Options:
-t, --target FILENAME Sequence file [required]
-f, --seq-format [fasta|fa|gb|genbank]
Sequence file format [default: fasta]
--help Show this message and exit.
```
- Fasta file (could be a multifasta file)
```console
$ crisprbact predict from-seq -t /tmp/seq.fasta guide-rnas.tsv
```
- GenBank file
```console
$ crisprbact predict from-seq -t /tmp/seq.gb -f gb guide-rnas.tsv
```
##### Output file
```
target PAM position prediction input_id
ATTTGTTGGCAACCCAGCCAGCCTT 855 -0.7310112260341689 CP027060.1
CACGTCCGGCAATATTTCCGCGAAC 830 0.14773859036983505 CP027060.1
TCCGAGCGGCAACGTCTCTGATAAC 799 -0.4922487382950619 CP027060.1
GCTTAAAGGGCAAAATGTCACGCCT 769 -1.814666749464254 CP027060.1
CTTAAAGGGCAAAATGTCACGCCTT 768 -0.4285147731290152 CP027060.1
CGTTTGAGGAGATCCACAAAATTAT 732 -1.2437430146548256 CP027060.1
CATGAATGGTATAAACCGGCGTGCC 680 -0.8043242669169294 CP027060.1
```
## Contributing
### Clone repo
```console
$ git clone https://gitlab.pasteur.fr/dbikard/crisprbact.git
```
### Create a virtualenv
```console
$ virtualenv -p python3.7 .venv
$ . .venv/bin/activate
$ pip install poetry
```
### Install crisprbact dependencies
```console
$ poetry install
```
### Install hooks
In order to run flake8 and black for each commit.
```console
$ pre-commit install
```
from .predict import on_target_predict
__all__ = ["on_target_predict"]
from crisprbact import on_target_predict
from Bio import SeqIO
import click
class Config(object):
def __init__(self):
self.verbose = False
pass_config = click.make_pass_decorator(Config, ensure=True)
HEADER = ["target", "PAM position", "prediction", "seq_id"]
@click.group()
@click.option("-v", "--verbose", is_flag=True)
@pass_config
def main(config, verbose):
config.verbose = verbose
@main.group()
@pass_config
def predict(config):
pass
@predict.command()
@click.option("-t", "--target", type=str, required=True)
@click.argument("output-file", type=click.File("w"), default="-")
@pass_config
def from_str(config, target, output_file):
"""
Outputs candidate guide RNAs for the S. pyogenes dCas9 with predicted on-target
activity from a target gene.
[OUTPUT_FILE] file where the candidate guide RNAs are saved. Default = "stdout"
"""
if config.verbose:
print_parameters(target)
guide_rnas = on_target_predict(target)
click.echo("\t".join(HEADER), file=output_file)
write_guide_rnas(guide_rnas, output_file)
@predict.command()
@click.option(
"-t", "--target", type=click.File("rU"), required=True, help="Sequence file"
)
@click.option(
"-f",
"--seq-format",
type=click.Choice(["fasta", "fa", "gb", "genbank"]),
help="Sequence file format",
default="fasta",
show_default=True,
)
@click.argument("output-file", type=click.File("w"), default="-")
@pass_config
def from_seq(config, target, seq_format, output_file):
"""
Outputs candidate guide RNAs for the S. pyogenes dCas9 with predicted on-target
activity from a target gene.
[OUTPUT_FILE] file where the candidate guide RNAs are saved. Default = "stdout"
"""
fg = "blue"
if config.verbose:
print_parameters(target.name, fg)
click.echo("\t".join(HEADER), file=output_file)
for record in SeqIO.parse(target, seq_format):
if config.verbose:
click.secho(" - search guide RNAs for %s " % record.id, fg=fg)
guide_rnas = on_target_predict(str(record.seq))
write_guide_rnas(guide_rnas, output_file, record.id)
def print_parameters(target, fg="blue"):
click.secho("[Verbose mode]", fg=fg)
click.secho("Target sequence : %s" % target, fg=fg)
def write_guide_rnas(guide_rnas, output_file, seq_id="N/A"):
for guide_rna in guide_rnas:
# click.echo(guide_rna)
click.echo(
"\t".join(
[
guide_rna["target"],
str(guide_rna["pam"]),
str(guide_rna["pred"]),
seq_id,
]
),
file=output_file,
)
if __name__ == "__main__":
main()
import numpy as np
import gffpandas.gffpandas as gffpd
from typing import Tuple
import re
from importlib.resources import open_binary
with open("on_target/model/reg_coef.pkl", "br") as handle:
with open_binary("crisprbact", "reg_coef.pkl") as handle:
coef = np.load(handle, allow_pickle=True)
bases = ["A", "T", "G", "C"]
def encode(seq):
'''One-hot encoding of a sequence (only non-ambiguous bases (ATGC) accepted)'''
"""One-hot encoding of a sequence (only non-ambiguous bases (ATGC) accepted)"""
return np.array([[int(b == p) for b in seq] for p in bases])
......@@ -28,7 +28,7 @@ def find_targets(seq):
repam = "[ATGC]GG"
L = len(seq)
seq_revcomp = rev_comp(seq)
alltargets = [
return (
dict(
[
("target", m.group(1)),
......@@ -40,17 +40,19 @@ def find_targets(seq):
]
)
for m in re.finditer("(?=([ATGC]{6}" + repam + "[ATGC]{16}))", seq_revcomp)
]
return alltargets
)
def on_target_predict(seq):
seq = seq.upper() # make uppercase
seq = re.sub(r"\s", "", seq) # removes white space
alltargets = find_targets(seq)
alltargets = list(find_targets(seq))
if alltargets:
X = np.array(
[encode(tar["target"][:7] + tar["target"][9:]) for tar in alltargets] #encore and remove GG of PAM
[
encode(tar["target"][:7] + tar["target"][9:]) for tar in alltargets
] # encode and remove GG of PAM
)
X = X.reshape(X.shape[0], -1)
preds = predict(X)
......
File moved
This diff is collapsed.
[tool.poetry]
name = "crisprbact"
version = "0.1.0"
license = "GPL-3.0"
description = "Tools to design and analyse CRISPRi experiments"
authors = ["David Bikard <david.bikard@pasteur.fr>", "Remi Planel <rplanel@pasteur.fr>"]
keywords = ["CRISPR", "genomics", "bacteria", "CRISPRi", "screen"]
homepage = "https://gitlab.pasteur.fr/dbikard/crisprbact"
classifiers = [
"Environment :: Console",
"Operating System :: POSIX :: Linux",
"Intended Audience :: Science/Research",
"Programming Language :: Python :: 3",
"Topic :: Scientific/Engineering :: Bio-Informatics",
"Natural Language :: English",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)"
]
readme = "README.md"
[tool.poetry.dependencies]
python = "^3.7"
numpy = "^1.17"
click = "^7.0"
biopython = "^1.75"
[tool.poetry.dev-dependencies]
pytest = "^5.2"
flake8 = "^3.7"
pre-commit = "^1.20.0"
black = "^19.10b0"
[tool.poetry.scripts]
crisprbact= "crisprbact.cli:main"
[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
[tool.black]
target-version = ['py37']
include = '\.pyi?$'
exclude = '''
(
/(
\.eggs # exclude a few common directories in the
| \.git # root of the project
| \.hg
| \.mypy_cache
| \.tox
| \.venv
| \.hooks
| _build
| buck-out
| build
| dist
)/
| foo.py # also separately exclude a file named foo.py in
# the root of the project
)
'''
[flake8]
max-line-length = 89
max-complexity = 18
import pytest
from on_target.model.predict import on_target_predict
import crisprbact
def test_on_target_predict_empty():
# Empty sequence
predicted_target = on_target_predict("")
predicted_target = crisprbact.on_target_predict("")
assert len(predicted_target) == 0, "the list is non empty"
def test_on_target_predict_size_guide():
size_guide = 20
predicted_targets = on_target_predict(
"TGCCTGTTTACGCGCCGATTGTTGCGAGATTTGGACGGACGTTGACGGGGTCTATACCTGCGACCCGCGTCAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTACCAGGAAGCGATGGAGCTTTCCTACTTCGGCG"
predicted_targets = crisprbact.on_target_predict(
""" TGCCTGTTTACGCGCCGATTGTTGCG
AGATTTGGACGGACGTTGACGGGG
TCTATACCTGCGACCCGCGTCAGG
TGCCCGATGCGAGGTTGTTGAAGT
CGATGTCCTACCAGGAAGCGATGG
AGCTTTCCTACTTCGGCG"""
)
guides = (predicted_target["guide"] for predicted_target in predicted_targets)
for guide in guides:
......@@ -27,4 +31,3 @@ def test_on_target_predict_size_guide():
assert (
start_val - pam_val == 2
), "the difference between start and pam position is different than 2"
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment