Share this post on:

Lso examine our corpus to OntoNotes Release .here, since it is analogously a largescale manually made corpus project with many sorts of semantic and syntactic annotation .Table summarizes some criteria by which we compare CRAFT to other corpora.Comparison of corpora in terms of total numbers of wordstokens is summarized in Table .The Lys-Ile-Pro-Tyr-Ile-Leu complete corpus includes , tokens, plus the initial release consists of more than ,; they’re bigger than almost all goldstandard annotated corpora (for which we could uncover published numbers), including GENETAG, OntoNotes, GENIA, the PennBioIE Oncology and CYP Corpora, the MedPost Corpus, and BioInfer.The only corpora larger than ours by this criterion is definitely the silverstandard CALBC corpus, with ,, tokens, as well as the goldstandard ITI TXM PPI and TE Corpora, with ,, and ,, tokens, respectively; nonetheless, the counts in the ITI TXM corpora contain all versions on the subset of documents that were multiply annotated (independently, for IAA calculation), and, as discussed later, not all sections of the component documents of those corpora had been annotated.Corpora can also be compared on the size in the documents annotated, also summarized in Table .The majority of the corpora surveyed here are composed of somewhat brief documents.Amongst the shortest are these documents which can be individual sentences, which compose PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/21475304 the GENETAG, the ABGene Corpus, and BioInfer corpora.Bada et al.BMC Bioinformatics , www.biomedcentral.comPage ofTable Notion annotation attributes of corporacorpuscorpora total # wordstokens CRAFT Corpus , , (fullinitial release) ABGene BioInfer CALBC corpus CLEF Corpus FetchProt Corpus th ibVA Challenge Corpus GENETAG , , , ,,f# sort of documents articlesdomain(s) sources of MGI annotations of mouse genesgene productsannotation idea schema(s) Open Biomedical Ontologies (CL, ChEBI, SO, PRO, GO BPCCMF, NCBITaxon), Entrez Gene natotal # notion annotations , ,, sentences , sentences , abstracts variousi, , named entities, , relationshipsg ,,proteinprotein interactions immunology clinicalcancer information protein tyrosine kinase activity clinical information entity classes, relationships UniProt, NCBITaxon, UMLSh idea kinds concept varieties, UniProt notion kinds na articles discharge summaries , sentences, , , genesproteins, , alternative lexical formsGENIA .GREC ITI TXM PPITE Corpora MedPost OntoNotes .PennBioIE OncologyCYP v.Corpora Yapex Corpusf,, abstracts abstractshuman bloodcell transcription factors E.coli gene regulation proteinprotein interactionstissue expression entity classes, method classes , entities, , events classes concept kinds, Entrez Gene, RefSeqj, ChEBI, MeSH, NCBITaxonk , , ,,, ,, , , , ( ,) , ( ,) articles, newswire documents ,, abstracts abstractsEnglish Chinese news health-related genetics of oncologyinhibition of cytochrome P enzymes proteinprotein interactions s of WordNet senses, idea typesl na, verbsmna,BioInfer has , tokens total, and , excluding punctuation.BioInfer has , namedentity annotations and , annotations of what are termed relationships but that may possibly a lot more appropriately be conceptualized as approach or state classes and thus are integrated here, totaling , concept annotations.h In the CALBC corpus, NCBI Taxonomy and UMLS concepts had been respectively utilised to mark up species and disease mentions.The CLEF Corpus is composed of many kinds of healthcare documents entire patient records (themselves composed of narratives, imaging report, histopathology reports,.

Share this post on:

Author: calcimimeticagent