GH-50194: [C++] Move S3 and AWS-SDK to its own libarrow_s3.so#50195
GH-50194: [C++] Move S3 and AWS-SDK to its own libarrow_s3.so#50195raulcd wants to merge 4 commits into
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
74a0d20 to
07fa9cc
Compare
…nd add required ARROW_EXPORTs
…covered on s3fs_module_test and s3fs_module_test doesn't make sense on static builds
|
@pitrou @kou I've been working on splitting the S3 library (and the AWS SDK) outside $ ls -lhL libarrow.so libarrow_s3.so
-rwxrwxr-x 1 raulcd raulcd 59M Jun 22 19:30 libarrow_s3.so
-rwxrwxr-x 1 raulcd raulcd 317M Jun 22 19:29 libarrow.soAnd we can see AWS symbols aren't present on libarrow.so $ nm -C libarrow.so | grep -c "Aws::"
0
$ nm -C libarrow_s3.so | grep -c "Aws::"
33991With current main libarrow.so size and it contains AWS SDK symbols: $ ls -lhL libarrow.so
-rwxrwxr-x 1 raulcd raulcd 368M Jun 22 19:45 libarrow.so
$ ls -lhL libarrow_s3.so
ls: cannot access 'libarrow_s3.so': No such file or directory
$ nm -C libarrow.so | grep -c "Aws::"
33991Those are debug builds but as a summary: |
|
I think that we should use I think that bindings can provide convenient API to use the S3 module even if we use |
With conda this isn't necessary, we already ship all the With wheels this is another different beast and I have to explore a little further. A related issue: The original problem we had with wheels is that there's no mechanism to share dependencies between wheels. Auditwheel/delvewheel/delocate mangle the .so name to avoid other wheels clashing with other dependencies symbols. The problem is that As a note, I've just validated we don't mangle libarrow (or any of our .so) on the wheels. I am going to start exploring this a little further to see if I can come up with something even though I am still unclear about some of the questions above, like version matching to avoid ABI problems. Related: @amol- who worked on And some Python PEP attempts to define some external dependencies for wheels are on discussion: What I am saying is that using cc @h-vetinari who knows this space and might shed some light |
|
For the PyPI side, you might be able to do something similar to what numpy/scipy are doing with openblas as a wheel. |
| target_link_libraries(arrow_s3fs PRIVATE ${AWSSDK_LINK_LIBRARIES} arrow_shared) | ||
| set_source_files_properties(filesystem/s3fs.cc filesystem/s3fs_module.cc | ||
| PROPERTIES SKIP_UNITY_BUILD_INCLUSION ON) | ||
| if(ARROW_BUILD_STATIC AND WIN32) |
There was a problem hiding this comment.
The AND WIN32 isn't useful, right?
There was a problem hiding this comment.
We use the same pattern on other places:
if(ARROW_BUILD_STATIC AND WIN32)
target_compile_definitions(arrow_compute_static PUBLIC ARROW_COMPUTE_STATIC)
endif()
or
if(ARROW_BUILD_STATIC AND WIN32)
target_compile_definitions(arrow_static PUBLIC ARROW_STATIC)
endif()
Taking a look at the definition on visibility.h of ARROW_S3_STATIC is already guarded for WIN32:
#if defined(_WIN32) || defined(__CYGWIN__)
So it will only be used on WIN32, it does not seem necessary on others so I would say the AND WIN32 does nothing but it's hygiene?
| string(APPEND ARROW_S3_PC_CFLAGS "${ARROW_S3_PC_CFLAGS_PRIVATE}") | ||
| set(ARROW_S3_PC_CFLAGS_PRIVATE "") |
There was a problem hiding this comment.
What is the purpose of this? It doesn't seem used below?
There was a problem hiding this comment.
From what I remember, this is used to populate the following:
Cflags:@ARROW_S3_PC_CFLAGS@
Cflags.private:@ARROW_S3_PC_CFLAGS_PRIVATE@
on arrow/cpp/src/arrow/arrow-s3.pc.in this is using the same mechanism we introduced for other libraries here:
3351aeb
Something to do with pkg-config and static builds, @kou might share some light on why this was necessary I hardly remember but can re-explore again.
There was a problem hiding this comment.
In general, Cflags.private is used only for static build (pkgconf --static ...). If we build only static library, pkgconf ... (no --static) doesn't work. It's inconvenient.
This is for making pkgconf ... (no --static) workable with static library only build. (It's a R package case.)
|
So, this is as if |
Yes but with a small caveat. |
|
Oh, great, thank you! |
Warning
Do not merge, this PR is currently on discussion of the current approach
Rationale for this change
Trying to reduce the size of
libarrow.soand remove AWS SDK on some builds. Allow for users to plug and play based on requirements and divide our functionality into cleaner modules.What changes are included in this PR?
Unconditionally build S3 and the AWS SDK into a different module
libarrow_s3.sooutside oflibarrow.so.Update bindings to link against the new
libarrow_s3.solibrary.Update the Linux Package jobs to have the new module into a different package.
Are these changes tested?
Yes via CI
Are there any user-facing changes?
Yes, users will need to either link against
libarrow_s3.soor register usingLoadFileSystemFactories