Improve quality of automatic metadata extraction
Improve the quality of the automatic metadata extraction; add automatic retrieval of metadata from arXiv, PubMed etc.
146 comments
-
alberto
commented
Please consider giving us an option to manually re-search for an article entry if it is wrong and to use other services other than google scholar. Papers 2 seems to handle this well and allows you to select which services to search for metadata.
-
splash
commented
some publishers like acs provide the DOI for a paper in the link returned from google scholar. why not automatically scan the returned links for a DOI , and if present import it.
-
splash
commented
automatic import of georef references (eg geoscienceworld) would be nice
http://ammin.geoscienceworld.org/content/74/1-2/224.short
http://www.geoscienceworld.org/cgi/georef/1989063655 -
Reece
commented
If the article is in PubMed, the doi and the pubmed id should be stored with the record. Google Scholar search doesn't provide that (apparently).
-
colomb
commented
look also here what readcube is doing, may help design a better solution...
-
Martin B.
commented
Hey guys,
Just submitted this request about extracting emails from documents - don't know if it's the right thing to have it separate or if it should go in here?
Cheers
-
J.H.D.
commented
Seems to me that metadata import/management is *the* (potentially) distinguishing feature on which Mendeley eventually succeeds or fails. While users will give you a break for awhile, in steady state the "acceptable" bar here will be very high, and doing it badly will eventually be fatal.
Closely related is the ease with which users can fix incorrect data. I just imported my library of several thousand documents -- hundreds of them should have been read correctly but were mangled, and hundreds more are talks or whatever that I'd like to catalog but don't expect to be automatic. I'd grit my teeth and fix them if it were slick/fast, but it's slow and clumsy. So, give up, or wait and hope that the UI gets better?... -
Carl Anderson
commented
Mendeley's ability to garble metadata gleaned from JSTOR is particularly odd. Unless the paper has an easily grabbable DOI (which of course many oder papers do not), Mendeley makes bizarrely erroneous guesses -- which is hard to understand, given that JSTOR provides pretty much all the relevant metadata in a fairly standardized format even with the PDFs. Yet Mendeley usually scoops up the "added" date, rather than the "published date", or interprests the "published date" as an issue number, or piles most of the metadata (including the stable URL, which ought to be quite machine recognizeable!) into the title .... etc. This sort of thing ought to be reasonably easy to fix--or so I would imagine.
-
Carin Basson
commented
I've found that Plant Physiology and PNAS metadata extraction is prone to problems. Both have their bibliographic information at the bottom of the page.
-
Vicent
commented
In my previous comment to this feature, I was refering to this feature at Reference Manager :
http://www.refman.com/rminfo_inet.asp
You can search for a paper and have its fields imported into your database in such an easy way!
-
Carl Anderson
commented
I am surprised that Mendeley doesn't do a better job of using its own database of papers etc. to aid with identifying the correct metadata. It seems to me that a good behaviour would be, when a new item is added, that Mendeley tries to identify things like author and title in order to search existing entries already known within Mendeley, and presents the user with some options ("Is your new entry this paper, this paper, or this paper?"), of which the user can select the right one.
-
Patrik B
commented
Metadata extraction could be better. Latex commands in titles seem to be a problem. Example: http://pra.aps.org/abstract/PRA/v26/i6/p3325_1
The bibtex file is there with the correct information, could Mendeley optionally import associated bibtex when available?
-
ebioman
commented
Just found a minor inconvenience related to metadata extraction:
All pages are always "incomplete" e.g. 273-95 instead of 273-295
I assume this has as well some advantages leads on the other hand to confusion regarding more complex literature.
I would prefer if we could decide about that in the settings option menu or something similar -
Maurizio Paolillo
commented
Please, allow to recover metadata using NASA ADS (and maybe arXiv). They are much more accurate than Google Scholar for Astrophysical publications.
-
Vicent
commented
Please, Mendeley staff, copy the way "Reference Manager v12" does it: you can use a search box directly in the desktop program interface, and it searches in all the available data bases. The information/metadata retreival is said to be good.
-
Naglaasoliman
commented
this is very helpful to me.
thanks, -
Yoriko Y
commented
The metadata extraction should be made to work with languages other than English as well, especially those with non-Roman characters. Right now, Mendeley produces nonsensical metadata (if at all) for most of the documents I import that aren't in English.
-
Markus
commented
ACM digital library import does also not work 100% correctly. It seems to me that, when using the Bookmarklet, publication titles are only imported up to the first colon (if there is any). For example http://dl.acm.org/citation.cfm?id=985692.985698 : using the bookmarklet, the title "Caretta" is imported, even though it's "Caretta: a system for supporting face-to-face collaboration by integrating personal and shared spaces"
-
lukeb
commented
NASA ADS: the javascript button to import from NASA ADS often imports the comma between multiple authors. This comma ends up in the .bib file, and then Bibtex won't run properly until you manually delete the comma.
-
Yves
commented
For papers from IOP (Institue of Physics), the first page is usually a non-sensical list of useless information. Therefore, during automatic import, Mendeley usually cannot correctly extract the journal information. It'll be great if Mendeley could SMARTLY skip the first page.