Using Apache Thrift for Python & C++
Apache Thrift is a software framework for cross-language: providing what is essentially a remote-procedure call interface to enable a client application to access services from a service — which can be written in the-same, or another language. Thrift supports all of the major languages that you’d expect to use: including Python, C++, Java, JavaScript / Node.js, and Ruby.
Unfortunately, whilst there are quite a few tutorials on how to use Thrift: some of them concentrate on explaining how Thrift is working behind the scenes (which is important, of course): rather than on how to use it. There also aren’t that many that concentrate on using C++. So having spent some time working through some of the tutorials (and the book Learning Apache Thrift), I though I’d have a go at writing something of a practical guide. I’m quite deliberately not going to focus on how Thrift works: if you want that, let me suggest the Apache Thrift – Quick Tutorial. I’m also going to assume that you have Thrift installed on your computer already (again, there are lots of sets of instruction on how to do this — Google will be your friend), and that you’re using a Linux or MacOS computer. You should be able to follow-along with most of this if you’re using Windows: but there will be a few differences, especially when it comes to compiling C++ files.
The Service Description File
The first thing that you need to do when using Thrift, is to write an description of the service that you want to create — using the Thrift Interface Definition Language (IDL).
The formal of the file is pretty simple. At its simplest it can contain just four-lines and still do something useful.
service Logger { oneway void timestamp (1: string filename) }
As you can see, this defines a service called logger
— which has one service named timestamp
, which takes a single argument (of a string type), called filename
; and which doesn’t return anything. The keyword oneway
in the definition means that the code generated by Thrift will result in a function that won’t wait for the service before continuing.
Thrift has a few datatypes to represent different data types in the supported languages.
Type | Detail |
---|---|
bool |
Boolean |
byte |
An 8-bit signed integer |
i16 |
A 16-bit signed integer |
i32 |
A 32-bit signed integer |
i64 |
A 64-bit signed integer |
double |
A 64-bit floating-point value |
string |
A string |
Note that there are no unsigned data types in Thrift…
Thrift also supports the definition of structs, and also supports lists, sets, and dictionaries.
Let’s look at a more complete example .thrift file.
namespace cpp Logger namespace py LoggerPy service Logger { oneway void timestamp (1: string filename) string get_last_log_entry (1: string filename) void write_log (1: string filename, 2: string message) i32 get_log_size (1: string filename) }
This file also introduces another concept — Thrift Namespaces; these are optional; but when included they let you specify the namespace to be used for the Thrift generated code. There are language specific (in the example py
relates to Python, and cpp
relates to C++).
As in the previous example, you can have multiple-services in one file — but that makes the generated code even more complex (as each service is implemented distinct); so for simplicity, I’d suggest only defining single service in a file.
There’s one last thing we can add to the file before we run the Thrift generator. In this example, given that we’re dealing with file I/O, we might have situations where that I/O causes errors. Typically we’d handle such errors by the use of exceptions. Thrift let’s us define exceptions to be used in conjunction with Thrift code.
So here’s the full version of the LoggerService.thrift
file. (Note that the name of the file will be used in the names of some of the resulting files; so whilst you can name the file however you like, I’d recommend naming it something sensibly related to the service that it defines).
namespace py LoggerPy namespace cpp LoggerCpp exception LoggerException { 1: i32 error_code, 2: string error_description } service Logger { oneway void timestamp (1: string filename) string get_last_log_entry (1: string filename) throws (1: LoggerException error) void write_log (1: string filename, 2: string message) throws (1: LoggerException error) i32 get_log_size (1: string filename) throws (1: LoggerException error) }
This is pretty much the same as the previous example — but you can now see more clearly why we’d want to use oneway
for some void
services; but not for others. Since by definition the calling client won’t wait if we specify the service as oneway
we can’t throw an exception back to the client. So in this example we’ll have to handle the error silently within the function that defines our service.
Generating Code
Now that we have the IDL specification complete for our service, we can invoke Thrift and tell it to build some code in the languages that we specify. In this example, I’m going to start by defining the service in C++, and have that service called by a Python client.
To run Thrift simply run: thrift --gen py --gen cpp LoggerService.thrift
Each language generator specified will result in a directory named gen-xxx
(where xxx is the Thrift shorthand for the language — i.e. cpp, py, etc.) in the folder that Thrift was run from.
C++
For C++ Thirft actually helps us out quite a lot — by building a dummy version of the service definition: which will be named (in this case) Logger_server.skeleton.cpp
.
It also generates a pair of .cpp
& .h
files named for the Service (here Logger.cpp
and Logger.h
— which highlights a good reason not to include the word service in the name of your service), and two pairs of files named Logger_Service_types
and Logger_Service_constants
(taking their name from the name of the .thrift file used for the generation).
These files (as with pretty much all machine generated code) are pretty heavy-going to try to read; but you don’t really need to do anything other than include them in the compilation stage…
The one exception to that, is the server. Recommended practice is to make a copy of the …skeleton.cpp
file, and to build out from there.
The skeleton is a pretty short file — and you should be able to see very easily which bits you need to change to actually make your service do something useful. If we don’t make any changes to the server it should still run, but obviously won’t do anything especially useful (apart from echoing the name of the service method to the server process’s stdout).
For example, for here’s the code that the server will use to write a message line to the log.
void write_log(const std::string& filename, const std::string& message) { std::fstream file; file.open(filename.c_str(), std::fstream::out | std::fstream::app); if (file.is_open()) { file << message << std::endl; file.close(); } else { LoggerException err; err.error_code = 1; err.error_description = std::string("Could not open file") + filename; throw err; } }
A slightly more complex example is the method to return the last line from the log file.
... void get_last_log_entry(std::string& _return, const std::string& filename) { std::fstream file; file.open(filename.c_str(), std::fstream::in); if (file.is_open()) { std::string line; std::string lastline; while(std::getline(file, line)) lastline = line; _return = lastline; file.close(); } else { LoggerException err; err.error_code = 1; err.error_description = std::string("Could not open file ") + filename; throw err; } }
Note that when using a string as the return value in C++ we can’t use the normal C++ function return; but rather the generated C++ function returns void, and has an extra parameter named _return. This has been added by Thrift in code-generation, and is a pass-by-reference parameter that we set the “return” value to before we exit the method definition.
The service is defined as a C++ class — so we can write a properly object-oriented solution if we want to. For this simple example I’m not going to do that: so turning this into an version following good OO practice, where we persistently identify the log file name (for example) is left as a exercise for the reader… 🙂
Anyway, the complete code for the server is as follows.
// LoggerServer.cpp #include "Logger.h" #include <thrift/protocol/TBinaryProtocol.h> #include <thrift/server/TSimpleServer.h> #include <thrift/transport/TServerSocket.h> #include <thrift/transport/TBufferTransports.h> #include <chrono> #include <cstdio> #include <fstream> #include <string> #include <iostream> using namespace ::apache::thrift; using namespace ::apache::thrift::protocol; using namespace ::apache::thrift::transport; using namespace ::apache::thrift::server; using boost::shared_ptr; using namespace ::LoggerCpp; class LoggerHandler : virtual public LoggerIf { public: LoggerHandler() { // Your initialization goes here } void timestamp(const std::string& filename) { std::fstream file; file.open(filename.c_str(), std::fstream::out | std::fstream::app); if (file.is_open()) { std::time_t now; now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()); file << std::ctime(&now); file.close(); } else { LoggerException err; err.error_code = 1; err.error_description = std::string("Could not open file") + filename; throw err; } } void get_last_log_entry(std::string& _return, const std::string& filename) { std::fstream file; file.open(filename.c_str(), std::fstream::in); if (file.is_open()) { std::string line; std::string lastline; while(std::getline(file, line)) lastline = line; _return = lastline; file.close(); } else { LoggerException err; err.error_code = 1; err.error_description = std::string("Could not open file ") + filename; throw err; } } void write_log(const std::string& filename, const std::string& message) { std::fstream file; file.open(filename.c_str(), std::fstream::out | std::fstream::app); if (file.is_open()) { file << message << std::endl; file.close(); } else { LoggerException err; err.error_code = 1; err.error_description = std::string("Could not open file ") + filename; throw err; } } int32_t get_log_size(const std::string& filename) { int fs=0; std::fstream file; file.open(filename.c_str(), std::fstream::in | std::fstream::ate); if (file.is_open()) { fs = file.tellg(); file.close(); } else { LoggerException err; err.error_code = 1; err.error_description = std::string("Could not open file ") + filename; throw err; } return fs; } }; int main(int argc, char **argv) { int port = 9090; shared_ptr<LoggerHandler> handler(new LoggerHandler()); shared_ptr<TProcessor> processor(new LoggerProcessor(handler)); shared_ptr<TServerTransport> serverTransport(new TServerSocket(port)); shared_ptr<TTransportFactory> transportFactory(new TBufferedTransportFactory()); shared_ptr<TProtocolFactory> protocolFactory(new TBinaryProtocolFactory()); TSimpleServer server(processor, serverTransport, transportFactory, protocolFactory); server.serve(); return 0; }
Having added our (example) code to the server file — all we need to do now is to build it.
Given that you need to compile in quite a few support files, I thoroughly recommend using CMake to write the build scripts for you.
Here’s my CMakeLists.txt
file for this project.
cmake_minimum_required(VERSION 3.5) project(LoggerService) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wall -DHAVE_INTTYPES_H -DHAVE_NETINET_IN_H") set(THRIFT_DIR "/usr/local/include/thrift") set(BOOST_DIR "/usr/local/Cellar/boost/1.60.0_2/include/") include_directories(${THRIFT_DIR} ${BOOST_DIR} ${CMAKE_SOURCE_DIR}) link_directories(/usr/local/lib) set(BASE_SOURCE_FILES Logger.cpp Logger_Service_types.cpp Logger_Service_constants.cpp) set(SERVER_FILES LoggerServer.cpp) add_executable(LoggerServer ${SERVER_FILES} ${BASE_SOURCE_FILES}) target_link_libraries(LoggerServer thrift)
It’s hopefully self-explanatory, though you will need to change (if necessary) the paths to the install for Thrift, and Boost (a dependancy for Thrift), and substitute your filenames for the code files.
Python
Having built the server in C++, we can now turn our attention to Python, where we’ll make the client application.
Unfortunately Thrift doesn’t give us quite the same skeleton to work from; but the file itself is pretty simple, so we can easily write it from scratch.
import sys sys.path.append('gen-py') from LoggerPy import Logger from thrift import Thrift from thrift.transport import TSocket from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol try: transport = TSocket.TSocket('localhost', 9090) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = Logger.Client(protocol) transport.open() logfile="logfile.log" client.timestamp(logfile) print ("Logged timestamp to log file") client.write_log(logfile, "This is a message that I am writing to the log") print ("Last line of log file is: %s" % (client.get_last_log_entry(logfile))) print ("Size of log file is: %d bytes" % client.get_log_size(logfile)) transport.close() except TTransport.TTransportException: print ("Error starting client") except Thrift.TException, e: print ("Error: %d %s" % (e.error_code, e.error_description))
As you can see there’s not a lot to do there. The first 14-lines are essentially boiler-plate code, and after that all we need to do is create our client, and call the methods we wish to invoke from the server; handling any exceptions we generate from the server (except Thrift.TException…
), and the case where the transport fails to open (which is usually caused by the server not being running).
In part two, we’ll turn this the other way around — and see how to to call a Python server from C++.
4 thoughts on “Using Apache Thrift for Python & C++”
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Filed under: C++,Computing,Python,Uncategorised - @ August 17, 2016 11:03
As a quick follow-up I should add that the ‘throws’ clause for a given function in the .thrift file has a curious effect (at least in Python). Your server can still throw an exception – but it’s not of your user-defined type: but rather a generic TApplicationException. Troubleshooting this can be quite tricky until you spot the error!
Not that I did this, of course… 🙂
Hi, Great tutorial.
I have a few issues in following this tutorial.
First of all I have installed boost but I don’t have a “Cellar” folder that you mentioned in cMakeList file:
set(BOOST_DIR “/usr/local/Cellar/boost/1.60.0_2/include/”)
I use “/usr/local/lib” as boost_dir in this file.
I’m not sure if it’s reasonable or not!
My Second issue is that I got error after making file using these commands:
cmake ..
make
I got following error after make command:
Scanning dependencies of target LoggerServer
[ 20%] Building CXX object CMakeFiles/LoggerServer.dir/LoggerServer.cpp.o
… gen-cpp/LoggerService_types.h:15:10: fatal error: thrift/cxxfunctional.h: No such file or directory
#include
^~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
CMakeFiles/LoggerServer.dir/build.make:62: recipe for target ‘CMakeFiles/LoggerServer.dir/LoggerServer.cpp.o’ failed
make[2]: *** [CMakeFiles/LoggerServer.dir/LoggerServer.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target ‘CMakeFiles/LoggerServer.dir/all’ failed
make[1]: *** [CMakeFiles/LoggerServer.dir/all] Error 2
Makefile:83: recipe for target ‘all’ failed
I’m new in Thrift and cmake. Can you explain the cmake level more please?
I’m using Ubuntu 18.04
Thanks in advance.
Sorry for the delayed reply Lena.
The first question is an easy one. You’ll only see a “Cellar” folder on MacOS (not Ubuntu of any other Linux) – as it’s an artifact of the homebrew package management tool that’s used on MacOS.
The second question is, I think, related to the first. I suspect that on Ubuntu the boost_dir should be /usr/include/boost/ – as that’s where the “include” files (i.e. the cpp header files) typically live.
I hope this helps.