I suggest you ...

Improve quality of automatic metadata extraction

Improve the quality of the automatic metadata extraction; add automatic retrieval of metadata from arXiv, PubMed etc.

2,366 votes
Vote
Sign in
Signed in as (Sign out)
You have left! (?) (thinking…)
MendeleyAdminMendeley (Admin, Mendeley) shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →
Tim McNamaraTim McNamara shared a merged idea: Support Dublin Core metadata extraction from webpages  ·   · 

180 comments

Sign in
Signed in as (Sign out)
Submitting...
  • Robert KnightRobert Knight commented  ·   ·  Flag as inappropriate

    Hello TG61,

    The automated extraction will attempt to find DOIs that appear in the PDF. If there is a DOI that you can see in the PDF content but it doesn't appear in the DOI field in the Details pane for the document then please let us know.

    If Mendeley finds a DOI or Arxiv ID within the PDF it will attempt to fetch the data for that DOI (from CrossRef and arxiv.org respectively)

  • MSMS commented  ·   ·  Flag as inappropriate

    This feature does not work well for patents. Is there a way to improve this? Thanks.

  • Gabriele Gabriele commented  ·   ·  Flag as inappropriate

    The process : 1) title extraction - 2)scanning in pubmed website to search and import the PMID 3) and corrected importing of the reference should be automated all in one step.
    Gabriel

  • TG61TG61 commented  ·   ·  Flag as inappropriate

    Supporting one idea from EZ Yang: check in the PDF if a doi is available - and if so, try to use this to extract meta information from the web site rather than PDF.

  • Robert KnightRobert Knight commented  ·   ·  Flag as inappropriate

    Hi Neal,

    There haven't been any recent changes to the metadata extraction since earl this year (with the 0.9.6 release). Do you know when the problems began to appear?

  • Neal BarberNeal Barber commented  ·   ·  Flag as inappropriate

    The automatic metadata extract worked excellently - with very little error. With the new versions, however, the metadata extract is terrible. For some reason, the database automatically imports a random citation (particularly for older documents). It makes managing metadata much more difficult.

  • tomhawkingtomhawking commented  ·   ·  Flag as inappropriate

    I would like to be able to search again, even after paper data has been found, as sometimes it is retrieved incorrectly

  • Edward Z. YangEdward Z. Yang commented  ·   ·  Flag as inappropriate

    I think better automated retrieval of metadata will go a long way to helping us up when the heuristics fail: given a DOI, it should be a one click operation to get correct information from the relevant database. Those of us who are obsessive compulsive about our data will really appreciate it!

  • shenryshenry commented  ·   ·  Flag as inappropriate

    Many pdfs I receive have cover pages, especially from articles I have requested via inter library loan. Mendeley doesn't properly recognize the metadata from those PDFs. A fix would be nice, because having to manually correct imported PDFs somewhat defeats the purpose of easy use. This applies to new and old pdfs, as many pdfs from journal automatically inlcude cover pges. Thanks.

  • jose d anadonjose d anadon commented  ·   ·  Flag as inappropriate

    articles from JSTOR (and most with a first 'presentation' page in the pdf) are wrongly imported

  • cbgcbg commented  ·   ·  Flag as inappropriate

    If there was a way to import pdfs from within Mendeley, that might make for better metadata collection. I use WOS for lots/most of my searches, and their metadata is A++. If I could use them for my imports, my metadata would be much nicer. I've completely given up on Google Scholar because it is almost always wrong. And when it isn't wrong, it is incomplete.

  • Nathan BolandNathan Boland commented  ·   ·  Flag as inappropriate

    I have been using the Google Scholar search option for my journal articles so that I can get a website for the article so I can get the DOI # which I can use to correct the horrible entry that Google Scholar provided. Perhaps this could be automated?
    DOI entries seem to be pretty good sources of bibliographic info (although not perfect)

  • JasonJason commented  ·   ·  Flag as inappropriate

    In older references, have Mendeley run object recognition (OCR) to digitize the text. Otherwise text can not be highlighed, etc.

  • Sjúrður HammerSjúrður Hammer commented  ·   ·  Flag as inappropriate

    Hendrik, I would suggest you send a bug report to support@mendeley.com so they could get on it. Otherwise, I agree that you should be able to control or prioritize which extraction you want to use.

  • HendrikHendrik commented  ·   ·  Flag as inappropriate

    I want an option to override the automatically detected metadata. I want to paste the *correct* Bibtex entry into a field. The auto-detection gives me a wrong paper with even a wrong title, i.e. does not work at all (for one specific paper). this is annoying, given the fact that bibtex input would be feasible.

    otherwise: i like the tool. i'm not 100% convinced that i will save time, though. the idea is great.

  • nebolandneboland commented  ·   ·  Flag as inappropriate

    currently when you use the Google Scholar search, if it finds a "match" it automatically updates the fields with out asking you to accept the "match." Sometimes, the "match" is completely wrong and then you have to manually fix all of the fields, not just the few that were wrong.

  • Simon McGrathSimon McGrath commented  ·   ·  Flag as inappropriate

    Google scholar seems to have no problem accurately extracting data from both web pages and pdfs and identifying duplicate listings of the same paper (as in "view all X versions"). Mendeley is a brilliant idea but gets in a mess with this making setting up the library a real pain. Maybe you could licence the relevant technology from Google? You are reinventing the wheel, and it currently seems more square than round. The rest of Mendeley is wonderful - thank you.

  • perezxoloteperezxolote commented  ·   ·  Flag as inappropriate

    It is also a problem to trigger the search just by title. ISBN, if it is available, would be much more accurate (as implemented in Zotero). As it is focused stronger on Articles, Mendeley has a very annoying problem when extracting data of Books: despite most of the data are correct, it makes always out of a Book a journal article, impeding thus the extraction of data like city and publisher.

  • noonenoone commented  ·   ·  Flag as inappropriate

    I would like an option to disable automatic details extraction.
    Out of 100 documents tested not one got right info.
    example, original author from pdf properties says "Danny Something"... automatic extraction says... "this, how, Can, Tradition". Nice idea but useless.

Feedback and Knowledge Base