Commit bf5a7e5b authored by Rayan  CHIKHI's avatar Rayan CHIKHI
Browse files

and now.. unitigs

parent d39db073
# Construction of unitigs using BCALM on AWS Batch
# Taken from: "Orchestrating an Application Process with AWS Batch using AWS CloudFormation"
### Source
Taken from: "Orchestrating an Application Process with AWS Batch using AWS CloudFormation"
But I removed all the CodeCommit stuff (replaced by `deploy-docker.sh` manual deployment to ECR).
Amazon Elastic Container Registry (ECR) is used as the Docker container registry. AWS Batch will be triggered by the lambda when a dataset file is dropped into the S3 bucket.
### Design Considerations
1. Provided CloudFormation template has all the services (refer diagram below) needed for this exercise in one single template. In a production scenario, you may ideally want to split them into different templates (nested stacks) for easier maintenance.
2. Lambda uses Batch Jobs’ JobDefinition, JobQueue - Version as parameters. Once the Cloudformation stack is complete, this can be passed as input parameters and set as environment variables for the Lambda. Otherwise, When you deploy subsequent version of the jobs, you may need to manually change the queue definition:version.
Provided CloudFormation template has all the services (VPC, Batch *managed*, IAM roles, EC2 env, S3, Lambda)
3. Below example lets you build, tag, pushes the docker image to the repository (created as part of the stack). Optionally this can be done with the AWS CodeBuild building from the repository and shall push the image to AWS ECR.
### Installation
### Steps
1. Execute the below commands to spin up the infrastructure cloudformation stack. This stack spins up all the necessary AWS infrastructure needed for this exercise
Execute the below commands to spin up the infrastructure cloudformation stack.
```
./deploy-docker.sh [need to modify Amazon ID in it]
......@@ -24,11 +21,9 @@ Amazon Elastic Container Registry (ECR) is used as the Docker container registry
### Testing
Make sure to complete the above step. You can review the image in AWS Console > ECR - "batch-processing-job-repository" repository
1. AWS S3 bucket - aws-unitigs-<YOUR_ACCOUNT_NUMBER> is created as part of the stack.
2. Drop a dataset in it This will trigger the Lambda to trigger the AWS Batch
3. In AWS Console > Batch, Notice the Job runs and performs the operation based on the pushed container image. The job parses the CSV file and adds each row into DynamoDB.
2. Drop a dataset in it. This will trigger the Lambda to trigger the AWS Batch
3. In AWS Console > Batch, Notice the Job runs and performs the operation based on the pushed container image.
### Code Cleanup
......@@ -38,16 +33,21 @@ In short:
./cleanup.sh
```
Which does (except it doesn't do 2. for ECR):
Which does:
1. AWS Console > S3 bucket - aws-unitigs-<YOUR_ACCOUNT_NUMBER> - Delete the contents of the file
2. AWS Console > ECR - batch-processing-job-repository - delete the image(s) that are pushed to the repository
3. run the below command to delete the stack.
```
$ aws cloudformation delete-stack --stack-name batch-processing-job
```
What it doesn't do:
3. AWS Console > ECR - batch-processing-job-repository - delete the image(s) that are pushed to the repository
## License
This library is licensed under the MIT-0 License. See the LICENSE file.
......@@ -5,7 +5,11 @@ DOCKER_CONTAINER=aws-batch-s3-unitigs-job
REPO=${ACCOUNT}.dkr.ecr.us-east-1.amazonaws.com/${DOCKER_CONTAINER}
TAG=build-$(date -u "+%Y-%m-%d")
echo "Building Docker Image..."
docker build -t $DOCKER_CONTAINER .
docker build -t $DOCKER_CONTAINER \
--build-arg AWS_ACCESS_KEY_ID=$(./get-aws-profile.sh --key) \
--build-arg AWS_SECRET_ACCESS_KEY=$(./get-aws-profile.sh --secret) \
--build-arg AWS_DEFAULT_REGION=us-east-1 \
.
echo "Authenticating against AWS ECR..."
eval $(aws ecr get-login --no-include-email --region us-east-1)
echo "Tagging ${REPO}..."
......
......@@ -3,12 +3,30 @@ FROM python
COPY batch_processor.py /
RUN pip install --upgrade pip && \
pip install boto3 && \
pip install boto
pip install boto3 awscli
# local AWS credentials
ARG AWS_DEFAULT_REGION
#ENV AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION
ARG AWS_ACCESS_KEY_ID
#ENV AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
ARG AWS_SECRET_ACCESS_KEY
#ENV AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
# BCALM install from binaries
RUN aws s3 cp s3://aws-unitigs-tools/bcalm-binaries-v2.2.3-Linux.tar.gz .
RUN tar xf bcalm-binaries-v2.2.3-Linux.tar.gz && rm bcalm-binaries-v2.2.3-Linux.tar.gz
RUN mv bcalm-binaries-v2.2.3-Linux/bin/bcalm ./
# MFcompress
RUN aws s3 cp s3://aws-unitigs-tools/MFCompress-linux64-1.01.tgz .
RUN tar xf MFCompress-linux64-1.01.tgz && rm MFCompress-linux64-1.01.tgz
RUN mv MFCompress-linux64-1.01/MFCompressC ./
RUN pwd
RUN ls
CMD ["python", "batch_processor.py"]
#CMD ["python", "batch_processor.py"]
......@@ -17,7 +17,8 @@ LOGTYPE_INFO = 'INFO'
LOGTYPE_DEBUG = 'DEBUG'
def process_file(inputBucket, fileName, region):
try:
#try:
if True:
urllib3.disable_warnings()
s3 = boto3.client('s3')
......@@ -26,20 +27,29 @@ def process_file(inputBucket, fileName, region):
startTime = datetime.now()
# download reads from s3
local_file = "/tmp/" + fileName
local_file = str(fileName)
s3.download_file(inputBucket, fileName, local_file)
print("downloaded file to",local_file)
# run bcalm
os.system(' '.join(["./bcalm","-kmer-size","21","-in",local_file]))
unitigs_filename = '.'.join(local_file.split('.')[:-1])+".unitigs.fa"
# run mfc
os.system(' '.join(["./MFCompressC",unitigs_filename]))
compressed_unitigs_filename = unitigs_filename + ".mfc"
# upload unitigs to s3
newFileName = fileName+".processed"
s3.upload_file(local_file, inputBucket, newFileName)
s3.upload_file(compressed_unitigs_filename, inputBucket, compressed_unitigs_filename)
endTime = datetime.now()
diffTime = endTime - startTime
logMessage(fileName, "File processing time - " + str(diffTime.seconds), LOGTYPE_INFO)
except Exception as ex:
logMessage(fileName, "Error processing files:" + str(ex), LOGTYPE_ERROR)
#except Exception as ex:
# logMessage(fileName, "Error processing files:" + str(ex), LOGTYPE_ERROR)
def main():
inputBucket = ""
......@@ -48,7 +58,7 @@ def main():
#try:
if "InputBucket" in os.environ:
inputBucket = os.environ.get("InputBucket")
inputBucket = os.environ.get("InputBucket")
if "FileName" in os.environ:
fileName = os.environ.get("FileName")
if "Region" in os.environ:
......@@ -91,11 +101,11 @@ def logMessage(fileName, message, logType):
print(logMessageDetails)
def constructMessageFormat(productId, message, additionalErrorDetails, logType):
def constructMessageFormat(fileName, message, additionalErrorDetails, logType):
if additionalErrorDetails != "":
return "ProductId: " + productId + " " + logType + ": " + message + " Additional Details - " + additionalErrorDetails
return "fileName: " + fileName + " " + logType + ": " + message + " Additional Details - " + additionalErrorDetails
else:
return "ProductId: " + productId + " " + logType + ": " + message
return "fileName: " + fileName + " " + logType + ": " + message
if __name__ == '__main__':
main()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment