Improve quality of automatic metadata extraction
Improve the quality of the automatic metadata extraction; add automatic retrieval of metadata from arXiv, PubMed etc.
Robert Knight commented
The automated extraction will attempt to find DOIs that appear in the PDF. If there is a DOI that you can see in the PDF content but it doesn't appear in the DOI field in the Details pane for the document then please let us know.
If Mendeley finds a DOI or Arxiv ID within the PDF it will attempt to fetch the data for that DOI (from CrossRef and arxiv.org respectively)
This feature does not work well for patents. Is there a way to improve this? Thanks.
The process : 1) title extraction - 2)scanning in pubmed website to search and import the PMID 3) and corrected importing of the reference should be automated all in one step.
Supporting one idea from EZ Yang: check in the PDF if a doi is available - and if so, try to use this to extract meta information from the web site rather than PDF.
Robert Knight commented
There haven't been any recent changes to the metadata extraction since earl this year (with the 0.9.6 release). Do you know when the problems began to appear?
Neal Barber commented
The automatic metadata extract worked excellently - with very little error. With the new versions, however, the metadata extract is terrible. For some reason, the database automatically imports a random citation (particularly for older documents). It makes managing metadata much more difficult.
I would like to be able to search again, even after paper data has been found, as sometimes it is retrieved incorrectly
Edward Z. Yang commented
I think better automated retrieval of metadata will go a long way to helping us up when the heuristics fail: given a DOI, it should be a one click operation to get correct information from the relevant database. Those of us who are obsessive compulsive about our data will really appreciate it!
Many pdfs I receive have cover pages, especially from articles I have requested via inter library loan. Mendeley doesn't properly recognize the metadata from those PDFs. A fix would be nice, because having to manually correct imported PDFs somewhat defeats the purpose of easy use. This applies to new and old pdfs, as many pdfs from journal automatically inlcude cover pges. Thanks.
jose d anadon commented
articles from JSTOR (and most with a first 'presentation' page in the pdf) are wrongly imported
If there was a way to import pdfs from within Mendeley, that might make for better metadata collection. I use WOS for lots/most of my searches, and their metadata is A++. If I could use them for my imports, my metadata would be much nicer. I've completely given up on Google Scholar because it is almost always wrong. And when it isn't wrong, it is incomplete.
Nathan Boland commented
I have been using the Google Scholar search option for my journal articles so that I can get a website for the article so I can get the DOI # which I can use to correct the horrible entry that Google Scholar provided. Perhaps this could be automated?
DOI entries seem to be pretty good sources of bibliographic info (although not perfect)
In older references, have Mendeley run object recognition (OCR) to digitize the text. Otherwise text can not be highlighed, etc.
Sjúrður Hammer commented
Hendrik, I would suggest you send a bug report to email@example.com so they could get on it. Otherwise, I agree that you should be able to control or prioritize which extraction you want to use.
I want an option to override the automatically detected metadata. I want to paste the *correct* Bibtex entry into a field. The auto-detection gives me a wrong paper with even a wrong title, i.e. does not work at all (for one specific paper). this is annoying, given the fact that bibtex input would be feasible.
otherwise: i like the tool. i'm not 100% convinced that i will save time, though. the idea is great.
get book information from ISBN number like http://manas.tungare.name/software/isbn-to-bibtex/
Some time the extraction from amazon it is not running.
currently when you use the Google Scholar search, if it finds a "match" it automatically updates the fields with out asking you to accept the "match." Sometimes, the "match" is completely wrong and then you have to manually fix all of the fields, not just the few that were wrong.
Simon McGrath commented
Google scholar seems to have no problem accurately extracting data from both web pages and pdfs and identifying duplicate listings of the same paper (as in "view all X versions"). Mendeley is a brilliant idea but gets in a mess with this making setting up the library a real pain. Maybe you could licence the relevant technology from Google? You are reinventing the wheel, and it currently seems more square than round. The rest of Mendeley is wonderful - thank you.
It is also a problem to trigger the search just by title. ISBN, if it is available, would be much more accurate (as implemented in Zotero). As it is focused stronger on Articles, Mendeley has a very annoying problem when extracting data of Books: despite most of the data are correct, it makes always out of a Book a journal article, impeding thus the extraction of data like city and publisher.
I would like an option to disable automatic details extraction.
Out of 100 documents tested not one got right info.
example, original author from pdf properties says "Danny Something"... automatic extraction says... "this, how, Can, Tradition". Nice idea but useless.