Improve quality of automatic metadata extraction
Improve the quality of the automatic metadata extraction; add automatic retrieval of metadata from arXiv, PubMed etc.
Carl Edlund Anderson commented
There are just weird (and what I would think are easily fixable) errors in the way Mendeley gets the metadata from JSTOR PDFs. The way JSTOR presents its metadata within the PDF is a bit old-fashioned (if human-friendly), but quite regular. There's really no reason for Mendeley to confuse the download date with the publication date, etc.
improve the quality of automatic metadata extraction for JSTOR's pdf !!!!!
Please improve the quality of automatic metadata extraction for JSTOR's pdf !!!!!
James Doyle commented
Review, search by title... if you can't fix it, make a stop searching option.
Marcis Gasuns commented
http://archive.org/index.php Would not hurt with it's PDFs at least (eg, http://archive.org/details/akoontalorlostr00monigoog)
Danielle L commented
Wow, people have been aware of this problem for a long time and it is still not fixed. Any comment from Mendeley?
Is it possible to re-enter document details automatically? Just leave "Search by Title" button.
YES !! improve the quality of importation from JSTOR... (and pdf too !)
Eric Moyer commented
The following may be adaptable to better extraction from PDFs:
The code is GPL v3, so it may not work for you, but it their paper is, at least, worth reading.
Carl Edlund Anderson commented
Beyond the improved ability to extract metadata more accurately from 3rd-party resources (whether from the given journal's or publisher's web site, Google Books, or other databases), it seems to me like Mendeley ought to do a better job of mining its own store of metadata, whether stored locally within the given user's local database or from the whole aggregated store of all Mendeley users. It seems like it would be perfectly possible for Mendeley to compare the metadata for a given item in a local database for similarities with metadata for other items in its local or remote databases, and either automatically or upon request suggest alterations additions; e.g. your entry for a given article might be missing an ISSN number, but Mendeley might guess that your entry was the same entry as 20 other people who had metadata for what looked like the same article with the inclusion of the ISSN number, and it could ask you if you wanted to add the ISSN number to your local entry's metadata, etc.
That shouldn't be that hard, I should think ....? And potentially very useful!
If the metadata extraction is improved, will this also give further options in the file rename feature?
Philipp Stachwitz commented
The number of votes and comments shows that this is the major problem of Mendeley. Incorrect Metadata was the reason to leave Mendeley and go back to a the commercial product where I came from (Papers for Mac). Why isn't it at least possible to add the correct metadata, when it's available like in so many cases, when publishers are providing it.? BMJ for example.
First of all improve que quality of automatic metadata extraction for files I create (.doc of .pdf). I enterd métadata when I created a pdf odr a .doc files (author, title tag,... in document properties) but Mendeley don't recognize these métadata :(
Simone Hochgreb commented
Springer link documents are not being picked up. I tried with page:
and it prompts me to fill out the boxes rather than extracting it.
It may have something to do with the fact that this is a chinese server - I had to remove the block from the browser to allow it to save it to Mendeley.
The "Needs Review" papers can be reduced if some simple procedure can be used.
E.g., I often have some arxiv papers, which Mendeley has alreday retrived the arxiv number. If Mendeley can also do an automatic arxiv number look up, which most times will bring up the DOI number and then do an DOI number look up, then the whole metadata is immediate available.
So please allow the user to apply certain rules to the metadata retriving -- just like you can apply some rule to your email in some email management software like Thunderbird or Outlook.
Bobby Tang commented
Could you please allow support for SAE papers? Those papers are very popular among automotive engineers. Here is an example: http://papers.sae.org/2009-01-1137/
Also, it seems that IEEE explore is not working very well. Very often i have to modify a lot of fields by myself.
If the author name is Jack Smith, it is entered as "Smith, Jack". Different authors are separated by a carriage return.
It shows as "A. Smith" if you are do not click on the Authors field, and different author names will appear separated by ",".
If you enter "Jack Smith", Mendeley will automatically reorder it.
I am a beginner with Mendeley but what a headache. I have read plenty of forums and still I don't manage to address a relatively simple problem. I have a working paper (Carbon Capability: what does it mean, how prevalent is it, and how can we promote it?) written by 4 authors. If I add it from Google Scholar, only the first one is filled. If I drop in the pdf file, only 2 of them go in. I have manually inserted all the 4 of us, and Mendeley mangles them all in several ways, never the correct one. I have tried separating them with a , with a ;, is it possible that there is no FAQ around explaining how to fill these metadata given that they come out always wrong? Any help would be appreciated.
The page numbers problem do not come from Mendeley but from PubMed. I am not sure why they are so cheap on the format.
Kacper Rucinski commented
A few observations:
- We are in 21st century, computers support Unicode. I see no reason why non-ASCII characters get substituted in titles/abstracts upon metadata extraction. E.g., Δ gets substituted with "Delta", ± with "+/-", ² with "2", etc. That's incorrect and makes searching so much difficult.
- Page numbers: the world uses the following format: 231-236; Mendeley proposes: "231-36". World: 12-15; Mendeley: "12-5". Seriously!
- Allow configuring the order in which metadata is displayed. I would like Month to come after Year, and not between Keywords and URL.