We recently reimplemented a search in a client project with solr (version 3.5). For the communication between our PHP application and the solr server we used the PHP library solr-php-client ( solr-php-client.googlecode.com).
After the first release to the stage server our client run loadtests on the whole application and the results for the search use case were pretty bad: 80% failures and an average response time of 30 seconds. The loadtest was run with 1000 parallel requests over 10 hours.
When checking the server we noticed that after around 3-4 hours the library did not get any responses back anymore from solr. The solr logs, however, were empty from the point where it stopped responding to the search requests until the loadtest was over. The system administrators found out that around the same time that solr stopped responding there were system errors that the maximun number of allowed open files on the system was exceeded. This limit was set to 1024 allowed open files. We started to regularly check the number of open files during the loadtests and noticed that that number went up to 25000 open files.
On the internet we found a lot of blog posts saying that setting useCompoundFile to true and mergeFactor to a lower number, e.g. 2 instead of 10, the number of index files could be reduced and hence also the number of opened files during a search. However, these blogposts were always talking about indexes with over 1'000'000 documents in them whereas our index only contained about 50'000 documents. So we started investigating in another direction. Netstat finally pointed us in the right direction. We noticed that some connections to the solr server were not properly closed after a search request. solr-php-client by default uses the function file_get_contents() to do requests to solr. This function opens a file handle even if you use it to open a stream. Hence, after some hours of constantly doing requests to solr through file_get_contents() and some of these connections not being closed properly the maximum number of open files was exceeded.
solr-php-client gives you the possiblity to choose the http transport, with file_get_contents() or curl. After changing the http transport to curl for all our requests to solr the failure rate of the loadtests went down to 0% and an average response time of 0.4 seconds and the maximum number of open files during the loadtest was at around 250 files.