Skip to content
Open
Show file tree
Hide file tree
Changes from 116 commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
a6d6073
add initial code.
erip Jul 18, 2022
1f6a883
fix typo.
erip Jul 18, 2022
696b21a
fix a few reference errors.
erip Jul 18, 2022
2e3cb4f
add stupid constructor
emjotde Jul 18, 2022
0d648bb
latest changes
emjotde Jul 18, 2022
b656975
add pybind11 submodule
emjotde Jul 18, 2022
11cfa21
working CMake compilation + installation
emjotde Jul 19, 2022
351eb35
add test script
emjotde Jul 19, 2022
f03c369
disable TCMALLOC in module
emjotde Jul 19, 2022
820bef1
add config parsing
emjotde Jul 19, 2022
e52f7d1
rename script
emjotde Jul 19, 2022
e5cc7c4
overload run to accept invidual (potentially newline-joined) strings …
erip Jul 19, 2022
90919ef
add python builds to gitignore.
erip Jul 19, 2022
bc48bf8
fix overload resolution errors in build.
erip Jul 19, 2022
3bff35e
batching test
emjotde Jul 19, 2022
52f1d6f
Merge branch 'feature/python-bindings' of github.com:marian-nmt/maria…
emjotde Jul 19, 2022
b1eedce
add python builds to gitignore.
erip Jul 19, 2022
efdc157
add coarse benchmarking to test inference. (#949)
erip Jul 19, 2022
703643a
add vector translate. TODO: revisit the underlying code to not join a…
erip Jul 19, 2022
4fd32cc
merge
emjotde Jul 19, 2022
792630c
fix bad merge.
erip Jul 19, 2022
5fb41b9
refactor build to skbuild instead of raw setuptools and cmake extensi…
erip Jul 19, 2022
b8b8522
update README build instructions.
erip Jul 19, 2022
ba28c92
parallel model reading
emjotde Jul 19, 2022
1073489
Merge branch 'feature/python-bindings' of github.com:marian-nmt/maria…
emjotde Jul 19, 2022
37f8a07
testing before cmake
alvations Jul 19, 2022
3ad4ac7
remove the pymarian from this branch, working on the fork of this first
alvations Jul 19, 2022
d2aef2a
half-working installation
emjotde Jul 20, 2022
104bc73
Merge branch 'feature/python-bindings' of github.com:marian-nmt/maria…
emjotde Jul 20, 2022
27f8221
minimal installation
emjotde Jul 20, 2022
4de8da0
remove pybind submodule
emjotde Jul 20, 2022
a52db82
Adding initial HF wrapper code
alexandremuzio Jul 20, 2022
e7fd6b4
fix includes
emjotde Jul 20, 2022
ba9ba07
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
emjotde Jul 20, 2022
d27b6b4
Adding initial version of kwargs
alexandremuzio Jul 21, 2022
ccabd9e
add huggingface's opus models to the mix.
erip Jul 21, 2022
10f238f
added the pythonic init
alvations Jul 21, 2022
cc8e323
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
alvations Jul 21, 2022
92303fc
fixed some typos
alvations Jul 21, 2022
55be37f
remove the wrap wrapper
alvations Jul 21, 2022
b3d61fe
Adding initial HF wrapper code
alexandremuzio Jul 20, 2022
591a6be
Adding initial version of kwargs
alexandremuzio Jul 21, 2022
6e29b50
add huggingface's opus models to the mix.
erip Jul 21, 2022
d7aeaa2
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
alvations Jul 21, 2022
63dc4a2
added pymarian ci
alvations Jul 21, 2022
5b9db4d
added pymarian ci
alvations Jul 21, 2022
75ece83
moved the pymarian to the end
alvations Jul 21, 2022
ccb3353
do not lose empty lines
emjotde Jul 21, 2022
a4b9168
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
emjotde Jul 21, 2022
b3226a8
added pymarian to rest of the os
alvations Jul 21, 2022
4741b4e
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
alvations Jul 21, 2022
fedaac7
install python after installing marian
alvations Jul 21, 2022
b9929ec
cos gh dont support 10.15 so 11 is min
alvations Jul 21, 2022
b9e0acb
typos...
alvations Jul 21, 2022
4d291db
add windowed demo
emjotde Jul 21, 2022
15a0582
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
emjotde Jul 21, 2022
f76a9e9
so that ci is happy with the specific place to look for setup.py
alvations Jul 21, 2022
87f8b12
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
alvations Jul 21, 2022
7a05ce7
just specify the path
alvations Jul 21, 2022
2441097
added Wl whole-archive to avoid undefined symbols
alvations Jul 21, 2022
f1c41a4
removed the toml
alvations Jul 21, 2022
72cc9ff
only use one global logger of the same name
emjotde Jul 21, 2022
ab238e8
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
emjotde Jul 21, 2022
958e7de
remove the -wl
alvations Jul 21, 2022
4cbe320
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
alvations Jul 21, 2022
45dc2ed
revert to pip install -e .
alvations Jul 21, 2022
b857b07
trying with no full archive
alvations Jul 21, 2022
b6e110a
added MTAPI server
mjpost Jul 21, 2022
d4c9d08
pymarian only test
alvations Jul 22, 2022
c90f42c
test only things I care
alvations Jul 22, 2022
5fbe03e
Finally adding kwargs to translate api
alexandremuzio Jul 22, 2022
3ced0ef
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
alexandremuzio Jul 22, 2022
54be85f
add caching to demo
emjotde Jul 22, 2022
bbfd305
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
emjotde Jul 22, 2022
2ae7bab
paragraphs
emjotde Jul 22, 2022
183dde2
space out code
emjotde Jul 22, 2022
5b1b758
try downgrading the versions
alvations Jul 22, 2022
4c7339e
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
alvations Jul 22, 2022
612498c
try again ci...
alvations Jul 22, 2022
8e1ef4b
specify build location
alvations Jul 22, 2022
033657a
add cmake flags back as appropriate.
erip Jul 22, 2022
38ed661
fix syntax error from missing comma.
erip Jul 22, 2022
4d527a2
attack CI.
erip Jul 22, 2022
737f852
refactor tests to just import pymarian.
erip Jul 22, 2022
9c3e0c4
add missing modules
emjotde Jul 22, 2022
e7844c6
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
emjotde Jul 22, 2022
6eebba5
Added server implementing MS MTAPI
mjpost Jul 22, 2022
30eee61
requirements file for server
mjpost Jul 22, 2022
0bc2ad9
Merge branch 'feature/python-bindings' of github.com:marian-nmt/maria…
mjpost Jul 22, 2022
e7e8de0
updated reqs
mjpost Jul 22, 2022
b2bbc9b
point setuptools to the correct package directory.
erip Jul 22, 2022
3620316
add missing wheel
erip Jul 22, 2022
87e31e6
pass in CUDA_VERSION to tag version appropriately.
erip Jul 22, 2022
9bed4f9
fix setup requirements (vs. install requirements) and specify minimum…
erip Jul 22, 2022
dfd8ecb
remove duplicated kwarg.
erip Jul 22, 2022
9181d87
set environment variable properly on Windows.
erip Jul 22, 2022
cd6984e
add cross-shell env vars.
erip Jul 22, 2022
ce0fe7f
fix version issue.
erip Jul 22, 2022
ed88f1d
temporarily exclude Windows pymarian build from CI.
erip Jul 23, 2022
7ca3005
add status bar
emjotde Jul 24, 2022
1b9489c
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
emjotde Jul 24, 2022
e0ba10f
revert packaging issue.
erip Jul 24, 2022
c48227f
reduce number of parallel workers to mirror normal cmake build.
erip Jul 24, 2022
b349d8b
remove __init__.py
emjotde Jul 24, 2022
50d7332
Merge branch 'feature/python-bindings' of https://github.com/marian-n…
emjotde Jul 24, 2022
0e63804
change to LF
emjotde Jul 24, 2022
5b91cf7
add proper tests.
erip Jul 24, 2022
15e87fe
revert to modern OS X
erip Jul 25, 2022
b1c871a
clean up build directory to save disk.
erip Jul 25, 2022
2ccc541
do not work in a dead directory.
erip Jul 31, 2022
4199d37
Merge branch 'master' into feature/python-bindings
snukky Sep 5, 2022
3599f09
Include 3rd_party/sentencepiece again
snukky Sep 5, 2022
b398263
remove spm_encode from test since we are statically linking
erip Sep 13, 2022
3d8c64b
clean up build dir and pick correct working dir when building pymarian.
erip Sep 20, 2022
08c4ef0
clean up build dir and pick correct working dir when building pymarian.
erip Sep 21, 2022
87ff8ad
use unix make instead of ninja due to bug encountered during dev.
erip Sep 21, 2022
4661fa9
Merge branch 'master' into feature/python-bindings
erip Nov 10, 2022
2c5863c
Merge branch 'master' into feature/python-bindings
emjotde Aug 21, 2023
4b9588f
Merge branch 'feature/python-bindings' of github.com:marian-nmt/maria…
emjotde Aug 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion .github/workflows/macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,5 +49,16 @@ jobs:
./marian --version
./marian-decoder --version
./marian-scorer --version
./spm_encode --version

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation for this to get removed?

ls -hlv $(find . -maxdepth 1 -type f -perm +ugo+x \( -name "marian*" -o -name "spm*" \))
cd ..
rm -rf build/

- name: Install PyMarian
working-directory: src/python
run: |
python3 -m venv .venv && source .venv/bin/activate
python3 -m pip install -U setuptools wheel
python3 -m pip install pybind11 sentencepiece scikit-build
python3 setup.py install -j2 -- -G"Unix Makefiles"
cd ../../
python3 -c "from pymarian import Translator"
18 changes: 15 additions & 3 deletions .github/workflows/ubuntu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ jobs:
-DCOMPILE_CPU=${{ matrix.cpu }} \
-DCOMPILE_CUDA=${{ matrix.gpu }} \
-DCOMPILE_EXAMPLES=${{ matrix.examples }} \
-DUSE_TCMALLOC=OFF \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python bindings doesn't work with TCMalloc? Why? Would be great to have it fixed or at least commented/documented.

-DCOMPILE_SERVER=on \
-DCOMPILE_TESTS=${{ matrix.unit_tests }} \
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-${{ matrix.cuda }} \
Expand All @@ -139,7 +140,18 @@ jobs:
run: |
./marian --version
./marian-decoder --version
./marian-scorer --version
./marian-server --version
./spm_encode --version
Comment on lines -142 to -144

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, why do we need to remove these checks?

ls -hlv $(find . -maxdepth 1 -type f -executable \( -name "marian*" -o -name "spm*" \))


- name: Install PyMarian
working-directory: build
env:
CUDA_VERSION: ${{ matrix.cuda }}
run: |
python3 -m venv .venv && source .venv/bin/activate
python3 -m pip install -U setuptools wheel
python3 -m pip install pybind11 sentencepiece scikit-build
cd ../src/python
python3 setup.py build -j2 install
cd ../../
python3 -c "from pymarian import Translator"
16 changes: 16 additions & 0 deletions .github/workflows/windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -134,4 +134,20 @@ jobs:
.\marian-decoder.exe --version
.\marian-scorer.exe --version
dir *.exe
cd ..
rd /s /q build
Comment on lines +137 to +138

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for debugging? I would suggest adding a short comment above explaining what it does and why it is done.

shell: cmd

- name: Install PyMarian
working-directory: src/python
run: |
python3 -m venv .venv
.venv\Scripts\activate.bat
python3 -m pip install -U setuptools wheel
python3 -m pip install pybind11 sentencepiece scikit-build
python3 setup.py install -j2
cd ..\..\
python3 -c "from pymarian import Translator"
env:
CUDA_VERSION: ${{ matrix.cuda }}
shell: cmd
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Config files from CMake

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is still valid, isn't it?

src/common/project_version.h
src/common/git_revision.h
src/common/build_info.cpp
Expand Down Expand Up @@ -63,3 +62,8 @@ examples/mnist/*ubyte
.vs
.vscode

src/python/build/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider adding a comment above explaining that group of excluded files, for example:

Suggested change
src/python/build/
# PyMarian temporary files
src/python/build/

src/python/dist/
src/python/_skbuild/
src/python/pymarian.egg-info/
src/python/bench/*.txt
3 changes: 2 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ option(USE_MKL "Compile with MKL support" ON)
option(USE_MPI "Use MPI library" OFF)
option(USE_NCCL "Use NCCL library" ON)
option(USE_SENTENCEPIECE "Download and compile SentencePiece" ON)
option(USE_TCMALLOC "Use TCMALLOC if available" ON)
Comment thread
erip marked this conversation as resolved.
option(USE_STATIC_LIBS "Link statically against non-system libs" OFF)
option(GENERATE_MARIAN_INSTALL_TARGETS "Generate Marian install targets (requires CMake 3.12+)" OFF)
option(DETERMINISTIC "Try to make training results as deterministic as possible (e.g. for testing)" OFF)
Expand Down Expand Up @@ -105,7 +106,7 @@ if(MSVC)
set(INTRINSICS "/arch:AVX2")
# set(INTRINSICS "/arch:AVX512")
# /bigobj is necessary for expression_operators.cpp. See https://stackoverflow.com/questions/15110580/penalty-of-the-msvs-compiler-flag-bigobj
set(CMAKE_CXX_FLAGS "/EHsc /DWIN32 /D_WINDOWS /DUNICODE /D_UNICODE /D_CRT_NONSTDC_NO_WARNINGS /D_CRT_SECURE_NO_WARNINGS /bigobj ${DISABLE_GLOBALLY}")
set(CMAKE_CXX_FLAGS "/permissive- /EHsc /DWIN32 /D_WINDOWS /DUNICODE /D_UNICODE /D_CRT_NONSTDC_NO_WARNINGS /D_CRT_SECURE_NO_WARNINGS /bigobj ${DISABLE_GLOBALLY}")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS} /MT /O2 ${INTRINSICS} /Zi /MP /GL /DNDEBUG")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS} /MTd /Od /Ob0 ${INTRINSICS} /RTC1 /Zi /D_DEBUG")

Expand Down
36 changes: 22 additions & 14 deletions src/3rd_party/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@

include_directories(.)

add_subdirectory(./yaml-cpp)
add_subdirectory(./SQLiteCpp)
add_subdirectory(./pathie-cpp)
add_subdirectory(./zlib)
add_subdirectory(./faiss)
add_subdirectory(./yaml-cpp EXCLUDE_FROM_ALL)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I believe it's worth explicitly mentioning what this flag does in a comment:

Suggested change
add_subdirectory(./yaml-cpp EXCLUDE_FROM_ALL)
# subdirectories with 3rd party code will not be included in the ALL target of the parent directory by default,
add_subdirectory(./yaml-cpp EXCLUDE_FROM_ALL)

add_subdirectory(./SQLiteCpp EXCLUDE_FROM_ALL)
add_subdirectory(./pathie-cpp EXCLUDE_FROM_ALL)
add_subdirectory(./zlib EXCLUDE_FROM_ALL)
add_subdirectory(./faiss EXCLUDE_FROM_ALL)
include_directories(./faiss)

# if(SKBUILD)
# add_subdirectory(./pybind11 EXCLUDE_FROM_ALL)
# endif(SKBUILD)

Comment on lines +11 to +14

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was a temporary code only, we should remove it.

Suggested change
# if(SKBUILD)
# add_subdirectory(./pybind11 EXCLUDE_FROM_ALL)
# endif(SKBUILD)

if(COMPILE_CPU)
if(NOT GENERATE_MARIAN_INSTALL_TARGETS)
set(INTGEMM_DONT_BUILD_TESTS ON CACHE BOOL "Disable intgemm tests")
Expand Down Expand Up @@ -42,7 +46,7 @@ if(USE_FBGEMM)

set(FBGEMM_BUILD_TESTS OFF CACHE BOOL "Disable fbgemm tests")
set(FBGEMM_BUILD_BENCHMARKS OFF CACHE BOOL "Disable fbgemm benchmark")
add_subdirectory(./fbgemm)
add_subdirectory(./fbgemm EXCLUDE_FROM_ALL)

# asmjit (3rd-party submodule of fbgemm) sets -Wall -Wextra near the end of
# the compile options, invalidating any -Wno-... flags that we may have set
Expand Down Expand Up @@ -72,15 +76,19 @@ if(USE_SENTENCEPIECE)

# regardless of -DUSE_STATIC_LIBS setting always build sentencepiece statically
set(SPM_ENABLE_SHARED OFF CACHE BOOL "Builds shared libaries in addition to static libraries." FORCE)
set(SPM_ENABLE_TCMALLOC ON CACHE BOOL "Enable TCMalloc if available.")

if(USE_STATIC_LIBS)
set(SPM_TCMALLOC_STATIC ON CACHE BOOL "Link static library of TCMALLOC." FORCE)
else(USE_STATIC_LIBS)
set(SPM_TCMALLOC_STATIC OFF CACHE BOOL "Link static library of TCMALLOC.")
endif(USE_STATIC_LIBS)

add_subdirectory(./sentencepiece)
if(USE_TCMALLOC)
set(SPM_ENABLE_TCMALLOC ON CACHE BOOL "Enable TCMalloc if available.")
if(USE_STATIC_LIBS)
set(SPM_TCMALLOC_STATIC ON CACHE BOOL "Link static library of TCMALLOC." FORCE)
else(USE_STATIC_LIBS)
set(SPM_TCMALLOC_STATIC OFF CACHE BOOL "Link static library of TCMALLOC.")
endif(USE_STATIC_LIBS)
else(USE_TCMALLOC)
set(SPM_ENABLE_TCMALLOC OFF CACHE BOOL "Enable TCMalloc if available.")
endif(USE_TCMALLOC)

add_subdirectory(./sentencepiece EXCLUDE_FROM_ALL)
Comment thread
erip marked this conversation as resolved.
include_directories(./sentencepiece)

set_target_properties(spm_encode spm_decode spm_train spm_normalize spm_export_vocab
Expand Down
37 changes: 31 additions & 6 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ include_directories(.)
include_directories(3rd_party)
include_directories(3rd_party/SQLiteCpp/include)
include_directories(3rd_party/sentencepiece)

include_directories(3rd_party/pybind11/include)

if(USE_SENTENCEPIECE)
include_directories(3rd_party/sentencepiece/third_party/protobuf-lite)
endif(USE_SENTENCEPIECE)
Expand Down Expand Up @@ -275,11 +278,11 @@ if (NOT COMPILE_LIBRARY_ONLY)
endif(COMPILE_SERVER)

foreach(exec ${EXECUTABLES})
target_link_libraries(${exec} marian)
if(CUDA_FOUND)
target_link_libraries(${exec} marian_cuda)
endif(CUDA_FOUND)
set_target_properties(${exec} PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}")
target_link_libraries(${exec} marian)
if(CUDA_FOUND)
target_link_libraries(${exec} marian_cuda)
endif(CUDA_FOUND)
set_target_properties(${exec} PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}")
Comment on lines +261 to +265

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting the removal of line indentations:

Suggested change
target_link_libraries(${exec} marian)
if(CUDA_FOUND)
target_link_libraries(${exec} marian_cuda)
endif(CUDA_FOUND)
set_target_properties(${exec} PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}")
target_link_libraries(${exec} marian)
if(CUDA_FOUND)
target_link_libraries(${exec} marian_cuda)
endif(CUDA_FOUND)
set_target_properties(${exec} PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}")

endforeach(exec)
endif(NOT COMPILE_LIBRARY_ONLY)

Expand All @@ -297,9 +300,31 @@ endif(COMPILE_EXAMPLES)

if(GENERATE_MARIAN_INSTALL_TARGETS)
# Install the marian library if given a "make install" target
include(GNUInstallDirs) # This defines default values for installation directories (all platforms even if named GNU)
include(GNUInstallDirs) # This defines default values for installation directories (all platforms even if named GNU)
install(TARGETS marian
EXPORT marian-targets
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
)
endif(GENERATE_MARIAN_INSTALL_TARGETS)

if(SKBUILD)
# Scikit-Build does not add your site-packages to the search path
# automatically, so we need to add it _or_ the pybind11 specific directory
# here.
execute_process(
COMMAND "${PYTHON_EXECUTABLE}" -c
"import pybind11; print(pybind11.get_cmake_dir())"
OUTPUT_VARIABLE _tmp_dir
OUTPUT_STRIP_TRAILING_WHITESPACE COMMAND_ECHO STDOUT)
list(APPEND CMAKE_PREFIX_PATH "${_tmp_dir}")

# Now we can find pybind11
find_package(pybind11 CONFIG REQUIRED)

pybind11_add_module(pymarian MODULE python/pymarian/bind.cpp)
target_link_libraries(pymarian PUBLIC marian)
if(CUDA_FOUND)
target_link_libraries(pymarian PUBLIC marian_cuda)
endif(CUDA_FOUND)
install(TARGETS pymarian DESTINATION .)
endif(SKBUILD)
14 changes: 12 additions & 2 deletions src/common/config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -316,12 +316,22 @@ std::vector<DeviceId> Config::getDevices(Ptr<Options> options,
return devices;
}

Ptr<Options>
parseOptions(int argc, char** argv, cli::mode mode, bool validate){
Ptr<Options> parseOptions(int argc, char** argv, cli::mode mode, bool validate) {
ConfigParser cp(mode);
return cp.parseOptions(argc, argv, validate);
}

Ptr<Options> parseOptions(const std::string& args, cli::mode mode, bool validate) {
std::vector<std::string> vArgs = utils::split(args, " ");

std::string dummy("dummy");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: having a dummy string is a bit confusing at a first glance, can we simplify these couple of lines to make them more readable or add a comment?

std::vector<char*> cArgs = { &dummy[0] };
for(auto& arg : vArgs)
cArgs.push_back(&arg[0]);

return parseOptions((int)cArgs.size(), cArgs.data(), mode, validate);
}

std::ostream& operator<<(std::ostream& out, const Config& config) {
YAML::Emitter outYaml;
cli::OutputYaml(config.get(), outYaml);
Expand Down
15 changes: 15 additions & 0 deletions src/common/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,4 +114,19 @@ Ptr<Options> parseOptions(int argc,
cli::mode mode,
bool validate = true);

/**
* Parse the command line options.
* Same as above, but args provided as C++ string object, space-delimited. This is used for instance
* in the python bindings as a simple string-based interface.
*
* @param args space delimited command line options
* @param mode change the set of available command-line options, e.g. training, translation, etc.
* @param validate validate parsed options and abort on failure
*
* @return parsed options
*/
Ptr<Options> parseOptions(const std::string& args,
cli::mode mode,
bool validate = true);

} // namespace marian
18 changes: 11 additions & 7 deletions src/common/logging.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,13 @@ std::shared_ptr<spdlog::logger> createStderrLogger(const std::string& name,
const std::string& pattern,
const std::vector<std::string>& files,
bool quiet) {
std::vector<spdlog::sink_ptr> sinks;
auto logger = spdlog::get(name);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I believe this could deserve a short comment why it's done.

if(!logger) {
std::vector<spdlog::sink_ptr> sinks;

auto stderr_sink = spdlog::sinks::stderr_sink_mt::instance();
if(!quiet)
sinks.push_back(stderr_sink);
auto stderr_sink = spdlog::sinks::stderr_sink_mt::instance();
if(!quiet)
sinks.push_back(stderr_sink);

// @TODO: think how to solve this better than using OMPI_COMM_WORLD_RANK env variable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: these lines also need an extra indentation

// only create output files if we are the main process or if MPI rank is not defined
Expand All @@ -42,10 +44,11 @@ std::shared_ptr<spdlog::logger> createStderrLogger(const std::string& name,
}
}

auto logger = std::make_shared<spdlog::logger>(name, begin(sinks), end(sinks));
logger = std::make_shared<spdlog::logger>(name, begin(sinks), end(sinks));

spdlog::register_logger(logger);
logger->set_pattern(pattern);
spdlog::register_logger(logger);
logger->set_pattern(pattern);
}
return logger;
}

Expand All @@ -72,6 +75,7 @@ bool setLoggingLevel(spdlog::logger& logger, std::string const level) {
}

static void setErrorHandlers();

void createLoggers(const marian::Config* config) {
std::vector<std::string> generalLogs;
std::vector<std::string> validLogs;
Expand Down
3 changes: 2 additions & 1 deletion src/models/model_task.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ struct ModelTask {

struct ModelServiceTask {
virtual ~ModelServiceTask() {}
virtual std::string run(const std::string&) = 0;
virtual std::string run(const std::string&, const std::string&) = 0;
virtual std::vector<std::string> run(const std::vector<std::string>&, const std::string&) = 0;
Comment on lines +14 to +15

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docstrings describing function parameters.

};
} // namespace marian
20 changes: 20 additions & 0 deletions src/python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Stuff required to build module (on Linux)
```
python3 -m venv ./venv
source ./venv/bin/activate
python -mpip install sentencepiece scikit-build pybind11
```

Stuff required to run windowed demo
```
python -mpip install pyqt5 sacremoses git+https://github.com/mediacloud/sentence-splitter
```

Build the module (CPU version)
```
python setup.py build -j16 install
```

```
echo "Hello World." | python test_translate.py '--config /home/marcinjd/MTMA/decoder.yml' /home/marcinjd/MTMA/source.spm /home/marcinjd/MTMA/target.spm

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove hard-coded paths:

Suggested change
echo "Hello World." | python test_translate.py '--config /home/marcinjd/MTMA/decoder.yml' /home/marcinjd/MTMA/source.spm /home/marcinjd/MTMA/target.spm
echo "Hello World." | python test_translate.py '--config decoder.yml' source.spm target.spm

```
14 changes: 14 additions & 0 deletions src/python/bench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#

Install pymarian then

```sh
pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
pip install transformers parallelformers sentencepiece sacrebleu
```

To run the benchmark,

```sh
bash benchmark.sh
```
28 changes: 28 additions & 0 deletions src/python/bench/benchmark.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/usr/bin/env sh

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MODELDIR=$1 # path to a directory with decoder.yml, source.spm, target.spm files

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a short comment at the top of this file what this script does and a simple usage example.

function translate_native {
sacrebleu -t wmt20 -l en-de --echo src | \
spm_encode --model /home/marcinjd/MTMA/source.spm | ../_skbuild/linux-x86_64-3.8/cmake-build/marian-decoder --quiet \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the path ../_skbuild/linux-... generic so that it is going to work for most users? If not, consider extracting it to a global variable.

-c /home/marcinjd/MTMA/decoder.yml -b4 --mini-batch 16 --maxi-batch 100 -d 0 1 2 3 \
| spm_decode --model /home/marcinjd/MTMA/target.spm > translations_native.txt
}

function translate_pybind {
sacrebleu -t wmt20 -l en-de --echo src | \
spm_encode --model /home/marcinjd/MTMA/source.spm | \
python test_translate.py '--config /home/marcinjd/MTMA/decoder.yml -b4 --mini-batch 16 --quiet --maxi-batch 100 -d 0 1 2 3' \
| spm_decode --model /home/marcinjd/MTMA/target.spm > translations_pybind.txt
Comment on lines +4 to +14

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, let's remove hard-coded paths:

Suggested change
sacrebleu -t wmt20 -l en-de --echo src | \
spm_encode --model /home/marcinjd/MTMA/source.spm | ../_skbuild/linux-x86_64-3.8/cmake-build/marian-decoder --quiet \
-c /home/marcinjd/MTMA/decoder.yml -b4 --mini-batch 16 --maxi-batch 100 -d 0 1 2 3 \
| spm_decode --model /home/marcinjd/MTMA/target.spm > translations_native.txt
}
function translate_pybind {
sacrebleu -t wmt20 -l en-de --echo src | \
spm_encode --model /home/marcinjd/MTMA/source.spm | \
python test_translate.py '--config /home/marcinjd/MTMA/decoder.yml -b4 --mini-batch 16 --quiet --maxi-batch 100 -d 0 1 2 3' \
| spm_decode --model /home/marcinjd/MTMA/target.spm > translations_pybind.txt
sacrebleu -t wmt20 -l en-de --echo src | \
spm_encode --model $MODELDIR/source.spm | ../_skbuild/linux-x86_64-3.8/cmake-build/marian-decoder --quiet \
-c $MODELDIR/decoder.yml -b4 --mini-batch 16 --maxi-batch 100 -d 0 1 2 3 \
| spm_decode --model $MODELDIR/target.spm > translations_native.txt
}
function translate_pybind {
sacrebleu -t wmt20 -l en-de --echo src | \
spm_encode --model $MODELDIR/source.spm | \
python test_translate.py '--config $MODELDIR/decoder.yml -b4 --mini-batch 16 --quiet --maxi-batch 100 -d 0 1 2 3' \
| spm_decode --model $MODELDIR/target.spm > translations_pybind.txt

}

function translate_hf_native {
sacrebleu -t wmt20 -l en-de --echo src | python test_hf_raw.py > translations_hf_raw.txt
}

echo -n "Native: "
time translate_native
echo
echo -n "Pybind: "
time translate_pybind
echo
echo -n "Transformers (native): "
time translate_hf_native
Loading