Get it
You do not need to run the Thrift server on your local computer, it can work anywhere, but RegionServers are usually a good place *. In the code, you then connect to this server.
Python example:
transport = TSocket.TSocket("random-regionserver", 9090)
If you obviously replace random-regionserver
with one of the servers running the Thrift server.
This server gets its configuration from the usual places. If you use CDH, you will find the configuration in /etc/hbase/conf/hbase-site.xml
and you will need to add the hbase.zookeeper.quorum
property:
<property> <name>hbase.zookeeper.quorum</name> <value>list of your zookeeper servers</value> </property>
When you start the Thrift server from a downloaded Apache distribution, it looks like hbase-site.xml
is likely to be in a different directory.
Scaling
One easy way to scale right now is to keep a list of all the Regionservers in your Thrift client and select a random connection. Or you create several connections and use random each time. Some language bindings (e.g. PHP) have TSocketPool
, where you can go through all your servers. Otherwise, you will need to do manual work.
Using this method, all reads and writes should be more or less distributed between Thrift servers in your cluster. Each read or write operation that arrives at the Thrift server will still be transferred to a Java API call from the Thrift server, which then opens a network connection to the appropriate Regionserver to perform the requested action.
This means that you will not get the same high performance as when using the Java API. This can help if you cache the regions yourself and get to the corresponding Thrift server, but even then an additional Java API call will be called, even if it appears on the local server. HBASE-4460 will help in this scenario, but it is not included in CDH3u4 or CDH4.
* There is an HBASE-4460 issue that actually includes a Thrift server in a Regionserver.