Accessing ODF File Members in Perl ================================== From the Open Document Format specs I've downloaded[1], I learn that the compresse dversion of an ODF file consists of the following members: * content.xml * styles.xml * meta.xml * settings.xml So, I need modules for two purposes: * Uncompressing * Parsing XML I can easily find them under the directories listed in the built-in array @INC. If not found, I go to https://cpan.org, and download using CPAN. For example: cpan XML::Writer The format of the compressed ODF file is known as "zip". So, to read it, I use the module IO::Uncompress::Unzip. And I add it using the command: use IO::Uncompress::Unzip qw(unzip $UnzipError); Then, I extract a member using the command: my $file_handle = IO::Uncompress::Unzip->new($filename, Name=>'content.xml') So, I can access the member like a regular file without saving it to disk. The entire subroutine I ise is: # ==== beginning of code: ==== sub handle_content { my $filename=shift; my $file_handle = IO::Uncompress::Unzip->new($filename, Name=>'content.xml') or die "unzip failed: $UnzipError"; read_content ($file_handle); $file_handle->close; } # ==== end of code ==== Then, the XML parser can use the file handle to read the XML file. To check that I understand the documentation of XML::Parser, I choose the style 'Objects' when creating the parsing object. I do not use 'Subs' because that means reading the XML file sequentialy, and write subroutines to handle events such as starting an element, ending an element, etc. So, the command: my $out=$pl->parse($file_handle); returns a reference to all the tree, and to check the various nodes of the tree, ther command "ref" I've found in the documentation for "perlfunc" can help determine if the node is a reference to a hash or to a scalar or not a reference at all. The rest of the code is as follows: ==== beginning of code ==== sub read_content { my $file_handle=shift; my $pl=XML::Parser->new(Style=>'Objects'); my $out=$pl->parse($file_handle); print ("Returned by Parser: " . ref($out) . "\n"); my @arr=@$out; foreach (@arr){ print_element ($_); } } sub print_element{ my $elem=shift; print "Our element is: " . $elem . "\n"; print "ref = " . ref $elem; print "\n"; print "\n"; foreach (keys %$elem){ print ($_ . "=" . $elem->{$_} . "\n"); } } ==== end of code ==== No, in the output, you can see that the parser returns a reference to an array, the elements are references to hashes. And the child nodes are stored in arrays. In the following output, you can see the attributes ot the root elemeent *the document root), and that it has a subtree with the key "Kids": Returned by Parser: ARRAY Our element is: main::office:document-content=HASH(0x37e78c51048) ref = main::office:document-content xmlns:draw=urn:oasis:names:tc:opendocument:xmlns:drawing:1.0 xmlns:form=urn:oasis:names:tc:opendocument:xmlns:form:1.0 xmlns:ooow=http://openoffice.org/2004/writer xmlns:xhtml=http://www.w3.org/1999/xhtml xmlns:chart=urn:oasis:names:tc:opendocument:xmlns:chart:1.0 xmlns:of=urn:oasis:names:tc:opendocument:xmlns:of:1.2 xmlns:tableooo=http://openoffice.org/2009/table xmlns:field=urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0 xmlns:table=urn:oasis:names:tc:opendocument:xmlns:table:1.0 xmlns:fo=urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0 xmlns:style=urn:oasis:names:tc:opendocument:xmlns:style:1.0 Kids=ARRAY(0x37e78c51030) Wow, a long list of name spaces. There are more. [1] https://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.odt