Zend_Search_Lucene is designed to work with UTF-8 charset. Index files store unicode data in Java's "modified UTF-8 encoding". Zend_Search_Lucene core completely supports it with one exception. 
However, text analyzers and query parser use ctype_alpha() for tokenizing text and queries. ctype_alpha() doesn't support UTF-8 and needs to be replaced by something else in nearest future.
Before that we are strongly recomend to convert your data into ASCII representation  (both for storing source documents, and for querying):
<?php $doc = new Zend_Search_Lucene_Document(); ... $docText = iconv('ISO-8859-1', 'ASCII//TRANSLIT', $docText); $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $docText)); ... $query = iconv('', 'ASCII//TRANSLIT', $query); $hits = $index->find($query); ?>
 Zend_Search_Lucene supports only Basic Multilingual Plane (BMP) characters (from 0x0000 to 0xFFFF) and doesn't support "supplementary characters" (characters whose code points are greater than 0xFFFF)
Java 2 represents these characters as a pair of char (16-bit) values, the first from the high-surrogates range (0xD800-0xDBFF), the second from the low-surrogates range (0xDC00-0xDFFF). Then they are encoded as usual UTF-8 characters in six bytes. Standard UTF-8 representation uses four bytes for supplementary characters.
 If data could contain non-ascii character or come in UTF-8.