Through techniques of dividing patterns and trends high quality information can be obtained from input text.
Andrew Clegg of the Shepherd Group is developing methods of extracting bioinformatics data resources (E.g. molecular biology journal articles) using text mining technique. As there is some challenge in the area of recognition and identification of gene and protein names, a system called BioNERD is developed and integrated with the system.
Another problem he has is with natural language, there is many different ways of expressing one something.
His Solution is : "parsing the sentence with a phrase-structure parser, mapping the resulting syntax tree into a dependency graph where each node is a word and each arc a grammatical relation (see image), and identifying subgraphs covering two or more entities which are characteristic of genuine relationships." - quoted from his site.
In layman terms , using a technique to split the sentence into smaller bits and determining their relationship with each other and thus drawing out information.
Once completed BOOTStrep will be available for public use and will be available in a number of languages.Further more the system itself will be able to validate its data automatically for accuracy and originality.
With both of these technologies in place we will be able to extract valuable information from journals and other documents without actually reading them thus saving us precious time to do our coding and other research. With the ever growing database of biological information we really need these services to help us keep track of what biological knowledge we have accumulated over time, else much of these discovery could be over looked due to the lack of human effort of actively seeking the discoveries that others have newly found.
sources : Andrew B. Clegg projects . BOOTStrep project website
No comments:
Post a Comment