Couchbase Server: tips for troubleshooting issues
Recently at work we started using Couchbase Server to replace a rather outdated caching solution in our architecture. This was the first time I had to use Couchbase and its .NET SDK, and I have encountered a couple of issues along the way.
This post is a recollection of the problems we faced. (If you are interested in a "getting started" tutorial, I recommend reading the official documentation of the .NET SDK, it has a pretty good description about what you can with the SDK, and the API is rather simple, so you shouldn't have a hard time getting started with it.)
Probably the single most important tool for troubleshooting problems for me was using the built-in logging of the .NET SDK. It can give much more detailed insights into the mechanisms going on in the SDK than what you can get from the
OperationResult object you get from the API.
The SDK uses
log4net. Enabling file-based logging is very simple, you basically just have to add the following section to your app or web configuration file.
<configuration> <configSections> <sectionGroup name="common"> <section name="logging" type="Common.Logging.ConfigurationSectionHandler, Common.Logging" /> </sectionGroup> <section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net" /> </configSections> <common> <logging> <factoryAdapter type="Common.Logging.Log4Net.Log4NetLoggerFactoryAdapter, Common.Logging.Log4Net"> <arg key="configType" value="INLINE" /> </factoryAdapter> </logging> </common> <log4net> <appender name="FileAppender" type="log4net.Appender.FileAppender"> <param name="File" value="C:\temp\log.txt" /> <layout type="log4net.Layout.PatternLayout"> <conversionPattern value="%date [%thread] %level %logger - %message%newline" /> </layout> </appender> <root> <level value="INFO" /> <appender-ref ref="FileAppender" /> </root> </log4net> </configuration>
Source and more details on the official Couchbase blog: Couchbase .NET SDK 2.0 Development Series: Logging
When troubleshooting timeout problems or sluggishness, it can be beneficial to take a look at what kind of requests the SDK is trying to send, and what ports it is trying to access.
You don't necessarily have to understand what the requests are about, it is often enough to
see which request is failing to get some clue about the problem.
Keep in mind that you might have to explicitly configure your application to use the proxy provided by Fiddler, you can find the details here.
Also, if you have your Couchbase server installation on the local machine, you have to use
ipv4.fiddler instead of
localhost in the url configuration.
It is pretty easy to run into networking issues when using Couchbase, especially if the server and the client run in a different network or environment, for example in different clouds.
One of the reasons for these issues can often be a firewall between the two computers, because Couchbase uses many different ports, and not just port 80. You can find the list of these ports in the official documentation.
This can cause either the connection to not work at all, or can cause long response times of the SDK.
Luckily these issues are usually easy to spot, because they produce an error in the logs, or you can see the failed requests with Fiddler, and opening the port in the firewall solves the problem.
There is still one problem I couldn't solve. If I hosted a Couchbase cluster in Amazon EC2, and tried to access it from a virtual machine in Azure, I got randomly slow response times. You can find the details in this Stack Overflow question. Luckily this was only a test setup, our production architecture is different, but it would be still interesting to find the root cause of this.
There is a surprising gotcha related to the bucket configurations of the SDK.
There are two ways to open buckets: you can either statically preconfigure the bucket to use in the app.config:
<couchbaseClients> <couchbase useSsl="false" operationLifeSpan="1000"> <servers> <add uri="http://192.168.56.101:8091/pools"></add> </servers> <buckets> <add name="my-bucket-name" useSsl="false" password="" operationLifespan="2000"> <connectionPool name="custom" maxSize="10" minSize="5" sendTimeout="12000"></connectionPool> </add> </buckets> </couchbase> </couchbaseClients>
In this case if you try to open the bucket
my-bucket-name, its configuration will be picked up from the config file.
However, there can be situations in which the name of the buckets you want to use are only known dynamically during runtime, so you cannot preconfigure any buckets. In that case you
couchbaseClients section will look like this:
<couchbaseClients> <couchbase useSsl="false" operationLifeSpan="1000"> <servers> <add uri="http://192.168.56.101:8091/pools"></add> </servers> </couchbase> </couchbaseClients>
The weird thing is that now if you try to open a bucket by using
cluster.OpenBucket(bucketName), the SDK does not pick up the server configuration from the
<servers> section, but at first it tries to connect to
localhost, and it only uses the proper server after that fails.
This can have two consequences:
- If we have no Couchbase server running at localhost:8091, opening the bucket can take seconds, because the client is trying to connect to localhost at first, and only after that fails does it connect to the configured server. This problem and a workaround is described in this SO question.
- If we actually do have a server at localhost:8091, then that is getting used instead of the one configured in
<servers>. I consider this clearly a bug.
The workaround is simple: you just have to add a dummy bucket to the configuration:
<couchbaseClients> <couchbase useSsl="false" operationLifeSpan="1000"> <servers> <add uri="http://192.168.56.101:8091/pools"></add> </servers> <buckets> <add name="dummy-bucket" useSsl="false" operationLifespan="1000"> </add> </buckets> </couchbase> </couchbaseClients>
A bucket with the specified name doesn't even have to exist on the server. Solely the existence of that section will make the SDK pick up the server configuration from the
TCP connection count
If you are accessing Couchbase at a high volume, possibly on multiple threads, the client can easily run out of available TCP connections. The maximum number of TCP connections can be configured in the app.config. Its default value is 2, which is really low, and it can cause the operation to fail. In the logs and in the result object you can see an exception similar to this.
The operation has timed out. Couchbase.IO.ConnectionUnavailableException: Failed to acquire a pooled client connection on 192.168.56.101:11210 after 5 tries. at Couchbase.IO.ConnectionPool`1.Acquire() at Couchbase.IO.ConnectionPool`1.Acquire() at Couchbase.IO.ConnectionPool`1.Acquire() at Couchbase.IO.ConnectionPool`1.Couchbase.IO.IConnectionPool.Acquire() at Couchbase.IO.Strategies.DefaultIOStrategy.Execute[T](IOperation`1 operation) at Couchbase.Core.Server.Send[T](IOperation`1 operation)
The solution is to increase the number of maximum TCP connections, which can be separately controlled on a default level, or for a specific bucket:
<couchbaseClients> <couchbase> <servers> <add uri="http://192.168.56.101:8091/pools"/> </servers> <connectionPool name="default" maxSize="35" minSize="5" sendTimeout="15000"> </connectionPool> <buckets> <add name="default"> <connectionPool maxSize="30" minSize="5" name="default"/> </add> </buckets> </couchbase> </couchbaseClients>
For us a pool size of 30 seems to have solved the problem, you should fine tune the configuration until you find a proper value for your scenario.
I found this solution with an official answer in the Couchbase Forums
The importance of
When we use the
OpenBucket of the
ICluster interface, a new TCP connection will be made to the Couchbase server. That bucket object should be used in a
using block, and when the object is disposed, the connection will be closed. The performance of this approach is not ideal if we are accessing Couchbase at a high volume.
Luckily the SDK contains support for pooling TCP connection in the form of the
ClusterHelper class. A couple of things to keep in mind:
- During the startup of your application (for example in the
Global.asax.csin case of a web application) you need to initialize the helper by calling
- When your application shuts down, you should call
- If you open a bucket with
ClusterHelper.GetBucket(...), you should not put it in a using block. The ClusterHelper will internally manage and dispose the bucket objects.
You can find more details about this in the official documentation.
Couchbase is a very exciting technology, on which big companies are betting (Ebay, LinkedIn, Ryanair, etc.). It's great to be able to replace an outdated legacy caching system with something more modern, and far better scalable.
However, the above problems can be quite annoying, and they can make the introduction of Couchbase in your architecture more challenging.
I hope this post will help you avoid making the same mistakes, or at least spend less time looking for the solution.