Showing posts with label odt. Show all posts
Showing posts with label odt. Show all posts

Thursday, April 7, 2011

Mime Type Problem when Uploading DOC, DOCX, ODT Files in Symfony

If you create a DOC file with OpenOffice, its mimetype will be application/octet-stream, that can be anything binary data. Of course, you can't allow this mimetype at uploading.

If you try to upload DOCX and ODT files, symfony will recognise them as a ZIP archive. To corrrect this and the OpenOffice DOC mimetype problem, add the 'mime_type_guessers' option to sfValidatorFile:
$this->setValidator('filename', new sfValidatorFile(array(
      'max_size' => ...,
      'path' => ...,
      'mime_type_guessers' => array('guessFromFileinfo'),
      'required' => ...,
      ...

Convert ODT to TEXT with PHP

Shorter way (requires odt2txt installed and shell_exec enabled):
echo shell_exec("odt2txt --encoding=utf8 test.odt");
Longer way (requires PHP 5.2+ and ZIP extension enabled):
function odt2text($filename) {
    return readZippedXML($filename, "content.xml");
}

function readZippedXML($archiveFile, $dataFile) {
    // Create new ZIP archive
    $zip = new ZipArchive;

    // Open received archive file
    if (true === $zip->open($archiveFile)) {
        // If done, search for the data file in the archive
        if (($index = $zip->locateName($dataFile)) !== false) {
            // If found, read it to the string
            $data = $zip->getFromIndex($index);
            // Close archive file
            $zip->close();
            // Load XML from a string
            // Skip errors and warnings
            $xml = DOMDocument::loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
            // Return data without XML formatting tags
            return strip_tags($xml->saveXML());
        }
        $zip->close();
    }

    // In case of failure return empty string
    return "";
}

echo odt2text("test.odt");