AACBB - Workshop on Accelerator Architecture for Computational Biology and Bioinformatics

Schedule


8:30 - 8:40	Opening Remarks
8:40 - 9:20	Keynote 1: Onur Mutlu (ETH, CMU) “Accelerating Genome Analysis: A Primer on an Ongoing Journey” (slides)
09:20 - 09:40	Mohammed Alser+, Hasan Hassan, Akash Kumar&, Onur Mutlu and Can Alkan+ (+Bilkent Univ., ETH Zurich, &TU Dresden) Exploring Speed/Accuracy Trade-offs in Hardware Accelerated Pre-Alignment in Genome Analysis* (slides)
09:20 - 09:40	Lisa Wu, Frank Nothaft, Brendan Sweeney, David Bruns-Smith, Sagar Karandikar, Johnny Le, Howard Mao, Krste Asanovic, David Patterson and Anthony Joseph (UC Berkeley) Accelerating Duplicate Marking In The Cloud

10:00 - 10:30	Coffee break

10:30 - 11:10	Invited Talk: Bertil Schmidt (JGU Mainz) “Next-Generation Sequencing: Big Data meets High Performance Computing Architectures”
11:10 - 11:30	Wenqin Huangfu+, Zhenhua Zhu, Tianqi Tang+, Xing Hu+, Yu Wang and Yuan Xie+ (+UCSB, Tsinghua University) GAME: GPU Acceleration of Metagenomics Clustering*
11:30 - 11:50	Jose M. Herruzo+, Sonia Gonzalez-Navarro+, Pablo Ibañez, Victor Viñals, Jesus Alastruey* and Oscar Plata+ (+Univ. of Malaga, Univ. of Zaragoza) Exact Alignment with FM-index on the Intel Xeon Phi Knights Landing Processor*

11:50 - 13:30	Lunch

13:30 - 14:10	Keynote 2: Srinivas Aluru (Georgia Tech) “Automata Processor and its Applications in Bioinformatics”
14:10 - 14:30	Tommy Tracy Ii, Jack Wadden, Kevin Skadron and Mircea Stan (UVA) Streaming Gap-Aware Seed Alignment on the Cache Automaton
14:30 - 14:50	Roman Kaplan, Leonid Yavits and Ran Ginosar (Technion) Processing-in-Storage Architecture for Large-Scale Biological Sequence Alignment
14:50 - 15:10	Xueqi Li, Guangming Tan, Yuanrong Wang and Ninghui Sun (ICT) The Genomic Benchmark Suite: Characterization and Architecture Implications

15:10 - 15:30	Coffee break

15:30 - 16:10	Invited Talk: Can Alkan (Bilkent University) "Addressing Computational Burden to Realize Precision Medicine" (slides)
16:10 - 16:30	Sergiu Mosanu and Mircea Stan (UVA) Burrows-Wheeler Short Read Aligner on AWS EC2 F1 (slides)
16:30 - 16:50	Angélica Alejandra Serrano-Rubio, Amilcar Meneses-Viveros, Guillermo B. Morales-Luna and Mireya Paredes-López (CINVESTAV-IPN) Towards BIMAX: Binary Inclusion-MAXimal parallel implementation for gene expression analysis

16:50 - 17:00	Short break

17:00 - 17:15	Meysam Taassori+, Anirban Nag+, Keeton Hodgson+, Ali Shafiee* and Rajeev Balasubramonian+ (+Univ. of Utah, Samsung Electronics) Memory: The Dominant Bottleneck in Genomic Workloads* (slides)
17:15 - 17:30	Meysam Roodi and Andreas Moshovos (Univ. of Toronto) Gene Sequencing: Where Time Goes
17:30 - 17:45	Calvin Bulla, Lluc Alvarez and Miquel Moreto (BSC) Are Next-Generation HPC Systems Ready for Population-level Genomics Data Analytics? (slides)

17:45 - 17:50	Closing remarks

	Social Event
18:15	Bus leaves to social event (Heurigen)

Keynote Talks

Onur Mutlu, ETH Zurich / CMU

“Accelerating Genome Analysis: A Primer on an Ongoing Journey”

8:40 - 9:20

Talk abstract: Genome analysis is the foundation of many scientific and medical discoveries as well as a key pillar of personalized medicine. Any analysis of a genome fundamentally starts with the reconstruction of the genome from its sequenced fragments. This process is called read mapping. One key goal of read mapping is to find the variations that are present between the sequenced genome and reference genome(s) and to tolerate the errors introduced by the genome sequencing process. Read mapping is currently a major bottleneck in the entire genome analysis pipeline because state-of-the-art genome sequencing technologies are able to sequence a genome much faster than the computational techniques that are employed to reconstruct the genome. New sequencing technologies, like nanopore sequencing, greatly exacerbate this problem while at the same time making genome sequencing much less costly.

This talk describes our ongoing journey in greatly improving the performance of genome read mapping. We first provide a brief background on read mappers that can comprehensively find variations and tolerate sequencing errors. Then, we describe both algorithmic and hardware-based acceleration approaches. Algorithmic approaches exploit the structure of the genome as well as the structure of the underlying hardware. Hardware-based acceleration approaches exploit specialized microarchitectures or new execution paradigms like processing in memory. We show that significant improvements are possible with both algorithmic and hardware-based approaches and their combination. We conclude with a foreshadowing of future challenges brought about by very low cost yet highly error prone new sequencing technologies.
Srinivas Aluru, Georgia Tech

“Automata Processor and its Applications in Bioinformatics”

13:30-14:10

Talk abstract: This talk will introduce the Micron Automata Processor (AP), a novel computing architecture that enables massively parallel execution of numerous non-deterministic finite automata. The processor inspires a new programming paradigm of solving problems using complex pattern matching engines executed over streaming data. The first part of this talk will focus on the processor characteristics, programming and execution environment, and design principles we discovered that are of value in developing applications on the AP. The second part will feature my group's research on developing bioinformatics algorithms for the AP including database search and motif detection.

Invited Talks

Bertil Schmidt, JGU Mainz

“Next Generation Sequencing: Big Data meets High Performance Computing Architectures”

10:30-11:10

Talk abstract: The progress of NGS has a major impact on medical and genomic research. This high-throughput technology can now produce billions of short DNA fragments in excess of a few Terabytes of data in a single run. This leads to massive datasets used by a wide range of applications including personalized cancer treatment and precision medicine. In addition to the hugely increased throughput, the cost of using high-throughput technologies has been dramatically decreasing. Low sequencing cost of around US$1K per genome has now rendered large population-scale projects feasible. However, in order to make effective use of the produced data, the design of big data algorithms and their efficient implementation on modern HPC systems is required. In this talk, I will present the design of scalable algorithms for metagenomic read classification and for massively parallel hash maps on multi-GPU nodes.
Can Alkan, Bilkent University

"Addressing Computational Burden for Low-Priority Genome Analyses "

15:30-16:10

Talk abstract: The main computational bottleneck of HTS data analysis is to map the reads to a reference genome, for which clusters are typically used. However, building clusters large enough to handle hundreds of petabytes of data is infeasible. Additionally, the reference genome is also periodically updated to fix errors and include newly sequenced insertions, therefore in many large scale genome projects the reads are realigned to the new reference. Therefore, we need to explore volunteer grid computing technologies to help ameliorate the need for large clusters. However, since the computational demands of HTS read mapping is substantial, and the turnaround of analysis should be fast, we also need a method to motivate volunteers to dedicate valuable resources.
For this purpose, we propose to merge distributed read mapping techniques with the popular cryptocurrency protocols. Cyryotocurrencies such as Bitcoin calculate a value (called nonce) to ensure new block (i.e. “money”) creations are limited in the system, however, this calculation serves no other practical purpose. Our solution (Coinami) replaces nonce with a token signed by an authority that can be acquired by returning the alignment results assigned by the authority. Authorities have two main tasks in our system: 1) inject new problem sets (i.e. “alignment problems”) into the system, and 2) check for the validity of the results to prevent counterfeit

February 24th, 2018

Vienna, Austria

About

Photos from the workshop

Call For Participation

This workshop focuses on architecture and design of hardware accelerators for computational biology and bioinformatics problems. The schedule has 4 invited talks, 13 paper presentations. Presentation topics cover the following:

The complete schedule can be found below

Program Committee

Schedule

Social Event

Keynote Talks

Onur Mutlu, ETH Zurich / CMU

“Accelerating Genome Analysis: A Primer on an Ongoing Journey”

8:40 - 9:20

Srinivas Aluru, Georgia Tech

“Automata Processor and its Applications in Bioinformatics”

13:30-14:10

Invited Talks

Bertil Schmidt, JGU Mainz

“Next Generation Sequencing: Big Data meets High Performance Computing Architectures”

10:30-11:10

Can Alkan, Bilkent University

"Addressing Computational Burden for Low-Priority Genome Analyses "

15:30-16:10

Workshop Organizers

From Technion, Israel Institute of Technology

Leonid Yavits

Roman Kaplan