Libxml2 has released version 2.12.0, which significantly changes the XML parser. The update fixes known issues that caused the XML parser to behave in a quadratic manner and improves the internal hash tables to reduce memory consumption.
To improve compatibility, the new version includes the –with-legacy configuration option, allowing users to add stubs for previously removed symbols from a code module.
The update now stores global variables in thread-local storage (TLS), helping to avoid fatal error conditions resulting from lazy allocation. A new API function, xmlCheckThreadLocalStorage, also enables users to check for allocation earlier if the compiler TLS is not supported. Some API functions now expect or return a const xmlError struct to prepare for future improvements.
The update also fixes cyclic dependencies in public header files, making certain headers no longer include others.
Encoding has been improved, with the update refactoring the encoding code and fully supporting calling xmlSwitchEncoding from client code to override the encoding for the push parser.
The update now streams data chunk by chunk when parsing data from memory, reducing peak memory consumption considerably.
A new API function, xmlCtxtSetMaxAmplification, allows parsing files that would otherwise trigger the protection of billions of laughs. The regex determinism checks have also been improved, and invalid XML Schemas that previous versions erroneously accepted will now be rejected.
This release has deprecated certain features such as the “xmlLastError” global, global parser options, and the old Windows build system. These features will no longer be supported in future versions of Libxml2.
In addition to deprecations, Libxml2 2.12.0 also comes with several bug fixes. For instance, the parser will no longer switch to ISO-8859-1 on encoding errors. The parser now supports encoded external PEs in entity values, and the line number is updated after coalescing text nodes. Furthermore, the parser now checks for truncated multi-byte sequences, ensuring that any encoding errors are detected early on.
Another notable update is that multiple top-level elements are now allowed in SAX2. This makes it easier to process XML documents containing multiple root elements.
The significant enhancements in this release are the optimization of xmlError structs to make them constant, which can provide a noticeable performance boost. Moreover, the xmlCurrentChar function has been improved by removing redundant checks, and the stack handling in xmlParseTryOrFinish has been fixed, making it more robust. Additionally, the parser now protects against quadratic default attribute expansion, which can prevent performance degradation in certain scenarios.
Other notable changes include adding public access to xmlFreeEntity, allowing for more flexibility in handling entities. The parser has also been updated to avoid undefined behavior in xmlParseStartTag2 and to improve error handling, making it more robust and reliable. Moreover, if available, the library has been updated to use thread-local storage, which can help improve performance in multi-threaded environments.
The latest release of Libxml2 also includes several bug fixes, such as the memory leak in xmlCompileAttributeTest and xmlXIncludeNewRef. The global state destruction on Windows has also been reworked, and the library now defines globals using macros, making it easier to manage global state.
The update focuses on enhancing portability, build systems, and tests while improving documentation.
One of the major highlights of this update is the improved compatibility with python3.12, thanks to Daniel Garcia Moreno. The build system has also been refined with the introduction of several features. Some of these features include the ability to check for static linking dependencies found in config files and the option to disable lzma support when using –with-minimum command on autotools.
The update also includes several bug fixes, such as removing some GCC warnings, handling of NOCONFIG case when setting locations from CMake target properties, and a fix on MinGW tests on Python.
The tests have been expanded to include testing xmlNextChar in testchar.c, extra tests starting with testparser.c, hash table tests, and streaming schema validation tests.
Additionally, the update includes a couple of improvements to the documentation, such as adding notes about runtest to MAINTAINERS.md, improving the documentation of configuration options, and allowing ‘unsigned’ without ‘int’.