I mean, if somebody includes the name of a gene in their text, the goal is to annotate (then use the annotation) the gene in the text of the paper. It's often nontrivial to extract the meaning from the text without having a full experiment human. A typical case would be a reference to a gene by name, where the paper authors really meant the transcript or the protein product.
The semantic parsing part is an annotation process. AI algorithms are required to make accurate annotations - a huge amount of context is required.
So if authors instead marked up their papers such that each mentioned entity's semantic meaning was obvious, it would make it much easier to AI that scans all papers and generates hypotheses.
I'm thinking of a less than ambitious angle to the use of AI or at least parsing algorithms: cross referencing commonly researched subjects and methods. If say a certain method seems to yield less than ideal results it would be nice to know if someone else figured out the problem well in advance of any laboratory work. Feeding that sort of information into a computer would be dead simple since I assume most methods are easily categorized as it is.
No, I don't think it's dead simple to take methods and compare them. The problem is that most of the method details are implicit and leave out a lot of the aspects that are required to replicate a study.
Thank you very much for unpacking it and all the best in your career in Biology! I am in tech now, but studied Biology at Texas A&M for undergrad so hearing the words in your response reminded me of the good ol' days!
What he/she is saying is that the pace of research & publishing is such that it's impossible for researchers & academics to stay current the only fashioned way (reading, writing and attending conferences). It would be beneficial to have an intelligent bot that could be trained to browse for content of interest as a mechanism to augment an individual human's capacity to do this manually.
But wouldn't a different data structure/database be better suited for this approach, than LaTeX?
I mean, you can still just babble but style it in LaTeX ... and the AI would have to find out, that your're saying nothing.
This would require true AI.
I mean, I don't know much about LaTeX, but I doubt there are Elements for "Hypthese", "Definition", "exact reference" etc.
If you would have those, described in a structured, simple language - then I guess, it will be much easier to process those Information for a KI, when the context is clear.
True. I work at Google now, and my advice would be to just write standard XHTML and let Google's parsers do their best job at inferring the meaning of the text.
Someday I'd love to feed scientific papers into AI algorithms. Watson does that, and google is starting to. But the file formats are such a mess, especially with equations and figures.
There are entire companies with squadrons of contractor scientists who just read papers and convert them to their ontology/database/analysis engine (Ingenuity, for example).
When I spoke to them they said if they had an AI that could do as good a job as people, they wouldn't need contractors. I think Google's approach would be to contract a bunch of scientists, have them read and interpret the papers, then use that data to train a deep net that could do it more accurately (you need some baseline humans to act as golden standards). This worked well for Google in several publicized examples, such as discriminating house numbers from numbers on cards in Street View imagery.
I understand the words, but in this context it has me a bit confused as to what problem you see AI solving.