Use este identificador para citar ou linkar para este item: https://locus.ufv.br//handle/123456789/12746
Tipo: Artigo
Título: Fangorn Forest (F2): a machine learning approach to classify genes and genera in the family Geminiviridae
Autor(es): Silva, José Cleydson F.
Carvalho, Thales F. M.
Fontes, Elizabeth P. B.
Cerqueira, Fabio R.
Abstract: Geminiviruses infect a broad range of cultivated and non-cultivated plants, causing significant economic losses worldwide. The studies of the diversity of species, taxonomy, mechanisms of evolution, geographic distribution, and mechanisms of interaction of these pathogens with the host have greatly increased in recent years. Furthermore, the use of rolling circle amplification (RCA) and advanced metagenomics approaches have enabled the elucidation of viromes and the identification of many viral agents in a large number of plant species. As a result, determining the nomenclature and taxonomically classifying geminiviruses turned into complex tasks. In addition, the gene responsible for viral replication (particularly, the viruses belonging to the genus Mastrevirus) may be spliced due to the use of the transcriptional/splicing machinery in the host cells. However, the current tools have limitations concerning the identification of introns. This study proposes a new method, designated Fangorn Forest (F2), based on machine learning approaches to classify genera using an ab initio approach, i.e., using only the genomic sequence, as well as to predict and classify genes in the family Geminiviridae. In this investigation, nine genera of the family Geminiviridae and their related satellite DNAs were selected. We obtained two training sets, one for genus classification, containing attributes extracted from the complete genome of geminiviruses, while the other was made up to classify geminivirus genes, containing attributes extracted from ORFs taken from the complete genomes cited above. Three ML algorithms were applied on those datasets to build the predictive models: support vector machines, using the sequential minimal optimization training approach, random forest (RF), and multilayer perceptron. RF demonstrated a very high predictive power, achieving 0.966, 0.964, and 0.995 of precision, recall, and area under the curve (AUC), respectively, for genus classification. For gene classification, RF could reach 0.983, 0.983, and 0.998 of precision, recall, and AUC, respectively. Therefore, Fangorn Forest is proven to be an efficient method for classifying genera of the family Geminiviridae with high precision and effective gene prediction and classification. The method is freely accessible at www.geminivirus.org:8080/geminivirusdw/discoveryGeminivirus.jsp.
Palavras-chave: Geminivirus; machine learning
Gene classification
Genus classification
Random Forest
Multilayer perceptron
Support vector machines
Editor: BioMed Central Bioinformatics
Tipo de Acesso: Open Access
URI: https://doi.org/10.1186/s12859-017-1839-x
http://www.locus.ufv.br/handle/123456789/12746
Data do documento: 30-Set-2017
Aparece nas coleções:Artigos

Arquivos associados a este item:
Arquivo Descrição TamanhoFormato 
document(1).pdftexto completo2,67 MBAdobe PDFThumbnail
Visualizar/Abrir


Os itens no repositório estão protegidos por copyright, com todos os direitos reservados, salvo quando é indicado o contrário.