GH-50240: [C++] Make IPC message decoding stricter#50235
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
d9884c5 to
f817f7c
Compare
1bd50da to
a40538f
Compare
|
@github-actions crossbow submit -g cpp |
|
Revision: a40538f Submitted crossbow builds: ursacomputing/crossbow @ actions-2ee7f5115a |
Error out when the advertised message metadata length is larger than expected.
a40538f to
487a7f8
Compare
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit f3f65df. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
An IPC file has a footer listing the exact locations in the file of the various IPC messages, such as RecordBatch messages.
However, we currently don't notice if a message size advertised in the IPC footer is larger than the actual serialized message size, therefore we happily accept invalid IPC files.
Found by OSS-Fuzz in https://issues.oss-fuzz.com/issues/524437775
What changes are included in this PR?
ReadFieldsSubsetdid not properly handle legacy IPC encapsulation (without a continuation indicator)MessageDecoderand the variousReadMessagefunctionsAre these changes tested?
Yes, by new test suite and by additional fuzz regression file.
Are there any user-facing changes?
Being stricter implies that some IPC files might be rejected that were accepted before. Hopefully such files don't exist, but some IPC writers might have emitted them anyway.