Development of an interactive pipeline for Genome wide

Slide Note

This project focuses on developing an e-infrastructure for accurate Genome Wide Association Study (GWAS) analysis, enabling researchers to efficiently study genetic variations and their associations with diseases. By providing a user-friendly interface and powerful visualization tools, this tool aims to streamline the complex research process and enhance the quality of research outcomes.

medidoc Follow

Uploaded on Feb 14, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Development of an interactive pipeline for Genome wide association analysis Falola Damilare & Adigun Taiwo Covenant University Bioinformatics research Nigeria (dare.falola@cu.edu.ng & taiwo.adigun@covenantuniversity.edu.ng) WACREN e-Research Hackfest Lagos (Nigeria) This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement n 654237

Outline Outline Background information Scientific Problem, Aim and benefits Computational & Data model Implementation strategy Typical user action workflow Summary and Conclusion 2

Background Information Background Information The need for tailor healthcare and treatment therapies to individual patients based on their genetic make-up and other biological features is becoming more essential in today s clinical practice. Genome Wide Association Study (GWAS) has been applied extensively to uncover several variations also known Single Nucleotide Polymorphisms (SNPs) and genes related to different diseases, traits and clinical symptoms. 3

Background Information Background Information Genome-wide association studies involves the collection of several unrelated individuals with and without a specific trait or disease. the use of high-throughput genotyping technologies to assay hundreds of thousands of single-nucleotide polymorphisms (SNPs) of those individuals. relate the genotyped SNPs using appropriate statistical techniques e.g. Chi Square, Logistic regression etc. to clinical conditions and measurable traits to find what SNPs might be associated with the disease. 4

Background Information Background Information 5

6 Typical GWAS workflow

Scientific Scientific Problem, Aim and benefits Problem, Aim and benefits A typical GWAS analysis involves the use of numerous complex commands from different languages, which makes research work complex for researchers. Use of large computing and storage resources to perform state of art GWAS data analysis which might not be available for most African or developing country researchers. AIM The aim of this project is to develop and implement an e-infrastructure that will provide state-of-the art GWAS analysis to local researchers. This tool will include all tools. Benefits This allows users focus mainly on the research problem, by making the analysis process a black box technique, which will bring about better and accurate research results. This solution also brings in user interactivity providing better visualization of results, swift comparison of results from different types of analysis, and management of several projects. 7

Computational & Data model 8

Typical user action workflow: The main users of the system are: Public health or medical researchers, scientists, and bioinformaticians who have and would upload genotype & phenotype data. i.e. either as a raw-intensity file, for analysis starting at the first phase or in a plink format, for analysis starting at the second phase or a list of significant SNPs for the third phase. A typical GWAS analysis involves three main phases, SNP chip genotype calling, Association testing and Post GWAS analysis. 9

Typical user action workflow: Phase 1 includes four (4) stages, which are initial quality control, genotype calling, post-calling quality control and conversion to plink file format. Phase II includes four steps, they are quality control, Population stratification correction association testing and result visualization. Phase III involves the annotation of the biological significant markers we associated with the disease phenotype in Stage II. 10

Implementation strategy Implementation strategy Back-end Each sub stages of every phase have implemented in various standalone bash, perl, R scripts and Java source codes. The business logic of the system will be implemented using Java technologies which includes: Servlets and Java Server Pages. Each scripts for each phase will be parallelized using "processes input and output declarations" of NextFlow DSL (Domain Specific Language). Complex stages like population stratification will be put into different NextFlow pipeline scripts. Java API for RESTful Web Services (JAX-RS) and Javscript Object Notation (JSON) will be used to aid developers' programmatic access to the web application FutureGateway will be used to provide access to distributed computing resources such as grid, cloud and HPCs. 11

Implementation strategy Implementation strategy Front-end Dataset upload will be done via FTP or globus online APIs for JAVA in to a storage element. gLibrary will be used to manage metadata about the data. HTML5 and JavaScript will be used for UI design. styling of the interface will be done using Cascading Style Sheet (CSS) and the system will be made mobile responsive using the CSS 3 @media Query. The database will be built using MYSQL (Relational Database Management System) RDBMS. 12

Summary and conclusions Summary and conclusions This solution makes GWAS analysis easier to perform, by requiring limited understanding computational needs from researchers. This allows them to focus mainly on research problem and give better biological interpretation to the results. 13