đť’Ąohn Collins
Bioinformatics Python Software Engineering
Personalized Genomics & Biomolecular Diagnostics
NGS Pipelines | Data Science | Immuno-Oncology | Custom Full-Stack Web Apps
Written: November 2017BIOINFORMATICS
First-hand data analysis on hundreds of next-generation high-throughput sequencing runs
Diverse Illumina® expertise (HiSeq™, MiSeq™, NextSeq™, NovaSeq™)—spanning secondary through tertiary pipeline analysis for computational interpretation of NGS raw sequence data; including, ultra-deep targeted panels, variant calling and genotyping, single-cell, unique molecular barcoding, and transcriptomics.
Advanced software development proficiency creating complex custom omics-based analytics using the Python data science stack
Daily use of SciPy, NumPy, Pandas, Jupyter, Conda (pkg./virtualenv mgmt.), scikit-learn, matplotlib, in a Linux/UNIX environment; with strong shell scripting, parallel programming, statistics, and machine learning (theano, caffe, keras) skills. Moderate fluency in R and HTML5/CSS3, and working knowledge of C/C++, Java, and JavaScript.
Extensive academic and professional history of analysis incorporating myriad open source bioinformatics tools and databases
Mapping: BWA(-MEM), Samtools, Bedtools, Bowtie2, STAR; Annotation: UCSC Genome Browser, GENCODE, Ensembl; NCBI: BLAST, RefSeq, SRA, GEO, db-[GaP/SNP/Var], ClinVar/OMIM; Cancer Genomics: TCGA, COSMIC; QC/Other: The Broad Institute—IGV/GATK/Picard/BD2K/LINCS, NIST’s GiaB (NA12878), Platinum genomes; R-Bioconductor; Google Genomics API on GCP; 1000Genomes Project, etc.
Rapidly improving
command of state-of-the-art best
practices for designing and deploying fast, scalable, high-performance pipelines. Namely, distributed task management for larger-than-memory size datasets (e.g.,
multi-genomic) using efficient data structures and MapReduce-based cluster computing.
EXPERIENCE
Bioinformatics Data Scientist
South San Francisco, California
Sept 2018—Sept 2020
Supported the bioinformatic data integration of an incredible interdisciplinary immuno- oncology effort spanning the latest technology in protein sciences, gene editing, and tumor immunology to create a first-of-its-kind patient-tailored cancer-targeting T cell therapy applicable to all solid tumor types.
Bioinformatics Pipeline Developer
Redwood City, California — Biologics Center
2018
Contributed new code and cloud computing infrastructure support to both: 1) supplement existing Python-backend + RShiny-frontend, wetlab user-facing, internal web apps for protein engineering sequence analysis, and 2) create from scratch a new NGS pipeline for "Antigen Receptor Repertoire" interpretation, as part of the Biologics Lead Discovery—Data Science & Bioinformatics team at the Immuno-Oncology R&D site.
Bioinformatician II
—Sequencing Analysis
Digital Biology Center
April/May 2017—July 2017
Created a working framework for a secondary and tertiary custom RNA-Seq pipeline for a single-cell, whole transcriptome, Unique Molecular Identifier-containing (UMI) assay called ddSEQ™ leveraging Bio-Rad’s Droplet Digital™ technology for individual cell isolation and 3’-end poly(A)-tail bead-capture along with Illumina’s “tagmentation” technology Nextera™ (adapter ligation).
Bioinformatics Analyst
at CareDx, Inc.
(RA/Bioinformatician, 2014-2016)
November 2014—April 2017
Python-centric genotyping analysis, rigorous analytical validation, and publication-quality visualization of a deep coverage targeted SNP panel NGS diagnostic that measures “percent donor-derived cell-free DNA” (% dd-cfDNA) without prior genotyping of the donor or recipient, implementing a mix of standard and complex custom software tools with a focus on assay performance characterization.
PUBLICATIONS
Public sources of Python-built NGS analytics can be seen in the figures to the open-access publication below, describing a novel liquid biopsy-like assay for quantitative surveillance of allograft “genome transplant dynamics.” Published in the Association for Molecular Pathology’s official journal and selected as the cover issue presented during 2016's annual meeting.
PUBLICATIONS
Association for Molecular Pathology—
The Journal of Molecular Diagnostics
November 2016, Cover Feature.
Grskovic, et al.
Fig. 3A. AlloSure™’s validated Limit of Blank is precisely 0.1% — the 95th percentile of 180 “blank” sample results (in this case, meaning 2n genomic ploidy; i.e., in which the total relative fraction of plasma-circulating allogeneic cfDNA has an expected mean value of zero percent).
EDUCATION
University of California
Santa Cruz
2013—2014
Master of Science (M.Sc.),
Biomolecular Engineering
and Bioinformatics
2009—2013
Bachelor of Science (B.S.),
Bioengineering
Graduated with Honors in the Major