VSzA techblog

Accented characters in hyperref PDF fields


I've always found hyperref one of the best features of LaTeX, and although it supported Unicode, certain accented characters (in my case, ő and ű) were treated abnormally in case of PDF metadata fields, such as author and title. I mostly ignored the issue and reworded the contents, until I met a situation, where changing the data was not an option. To illustrate the issue, the following example was saved as wrong.tex and got compiled with the pdflatex wrong.tex command.


\usepackage[unicode, pdftitle={Árvíztűrő tükörfúrógép}]{hyperref}


The result could be checked with pdfinfo and was far from what I expected.

$ pdfinfo wrong.pdf | grep Title
Title:          Árvízt¶r® tükörfúrógép

I searched the web, and was disappointed at first, having found unsolved forum threads, such as one written by also a Hungarian. Finally, I opened up the TeX section of the Stack Exchange network, and started typing a title for my question. Based on this, the forum offered a number of probably related posts, and I browsed through them out of curiosity. As it turned out, the solution lied within a post about Polish characters in pdftitle, and in retrospect, it seems obvious – like any other great idea. As Schweinebacke writes, “The optional argument of \usepackage is read by the LaTeX kernel, so hyperref cannot change scanning of the argument”. The problem can be eliminated simply by moving the title setup into a separate \hypersetup command – and behold, the pilcrow and the registered sign is gone, as seen in the following example.

$ diff wrong.tex right.tex
< \usepackage[unicode, pdftitle={Árvíztűrő tükörfúrógép}]{hyperref}
> \usepackage[unicode]{hyperref}
> \hypersetup{pdftitle={Árvíztűrő tükörfúrógép}}
$ pdfinfo right.pdf | grep Title
Title:          Árvíztűrő tükörfúrógép


next posts >
< prev post

Proudly powered by Utterson