BioFactHMM: MULTIDIMENSIONAL MODELING OF BIOLOGICAL DATA FROM HIDDEN MARKOV MODEL GENERATED DATASETS

Pradhan, M R, Mago, B and Kalra, D (2020) BioFactHMM: MULTIDIMENSIONAL MODELING OF BIOLOGICAL DATA FROM HIDDEN MARKOV MODEL GENERATED DATASETS. Indian Journal of Computer Science and Engineering, 11 (4). pp. 383-393. ISSN 22313850

[thumbnail of 30.pdf] Text
30.pdf - Published Version

Download (1MB)

Abstract

The ever growing biological research generates large volumes of biological data and knowledge bases ranging from clinical test results to genome analysis. The dynamic changes of genome sequences and complexity of these database and their relations have given lot of challenges to data analysis. There are many online databases are available for biological studies. It is essential that biological data can be analyzed in multidimensional way creating data warehouse and then online analytical processing. The method of multidimensional modeling, star schema is not sufficient for biological data as it cannot cater more relationships. The Snowflake schema though helpful in better relations among datasets than star schema but cannot model all data from all databases specially the hidden states of long new biological sequences or complex medical data. Looking at above scenario, the idea mentioned in this paper combined the efforts of generating datasets by HMM (Hidden Markov Model) from all types biological databases available online and use Fact Constellation schema of data warehouse modeling. Hidden Markov Model has adopted in this study to find newly datasets and help in analyzing relations between these datasets. Once the data sets generated the fact constellation schema of multidimensional modeling done for making data warehouse. Henceforth new proposed model in this work is called BioFactHMM schema specially proposed for biological data which is a mix of star and snowflake schema. This model desires to capture all semantics of bio sequence from various data sources using HMM. Then data warehouse modeling is done with design principles of Fact constellation schema. Subsequently, the analysis technique of OLAP cube is done to view the data and reports in a multidimensional way.

Affiliation: Skyline University College
SUC Author(s): Pradhan, M R ORCID: https://orcid.org/0000-0002-0115-2722, Mago, B ORCID: https://orcid.org/0000-0003-1537-1202 and Kalra, D
All Author(s): Pradhan, M R, Mago, B and Kalra, D
Item Type: Article
Uncontrolled Keywords: HMM, Multidimensional, Genome data, Biological data, Data Warehouse, Data Modeling, Fact Constellation, Biological databases
Subjects: B Information Technology > BT Data Management
Divisions: Skyline University College > School of IT
Depositing User: Mr Veeramani Rasu
Date Deposited: 24 Nov 2021 14:51
Last Modified: 24 Nov 2021 14:51
URI: https://research.skylineuniversity.ac.ae/id/eprint/35
Publisher URL: https://doi.org/10.21817/indjcse%2F2020%2Fv11i4%2F...
Publisher OA policy: https://v2.sherpa.ac.uk/id/publication/14571
Related URLs:

Actions (login required)

View Item
View Item
Statistics for SkyRep ePrint 35 Statistics for this ePrint Item