Apache logging to central syslog server

Posted 03/07/2010 09:58

Apache web server traditionally writes to local log files in /var/log/httpd.

At work we have been looking into PCI compliance, and it requires that log files are stored centrally so that if a server gets compromised and the local log files are modified, there is still an authoritative copy on the central log server.

Syslog is the standard Linux and UNIX way for transmitting log entries to a central server.

Problem is Apache only supports logging to syslog for it's error log, and not it's access log.

Thankfully the problem is relatively easy to fix with a short Perl script.

Apache Log Config

I am doing this on a RedHat server, so the config file locations will be specific to that.

Create a new file called /etc/httpd/conf.d/syslog.conf:

LogLevel warn
LogFormat "%v %V %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vcombined
CustomLog "|/usr/bin/httpd_syslog" vcombined
ErrorLog syslog:local2

Next create a Perl script /usr/bin/httpd_syslog to accept piped logs from Apache and send them to the local syslog service:

#!/usr/bin/perl
use strict;
use Sys::Syslog qw( :DEFAULT setlogsock );
setlogsock('unix');
openlog('httpd', 'cons,pid','local1');

#Read from STDIN and log to syslog
while (my $log = <STDIN>) {
        syslog('notice', $log);
}
closelog();

You can now reload Apache and both error logs and access logs should start flowing into /var/log/messages.

Syslog Config

Next up, we need to split out the error logs and access logs into their own files again, rather than have them appearing in the main /var/log/messages file.

To do this we need to modify /etc/syslog.conf:

Comment out the line:

*.info;mail.none;authpriv.none;cron.none  -/var/log/messages

And replace it with:

*.info;mail.none;authpriv.none;cron.none;local1.none;local2.none  -/var/log/messages
local1.*  -/var/log/httpd_access_log
local2.*  -/var/log/httpd_error_log

This tells the local syslog service not to log entries from local1 and local2 facilities (which we are using for Apache) into /var/log/messages, and instead log them into separate log files.

Central Syslog Server

To instruct the local syslog server to send all entries to a central server, add the following line to /etc/syslog.conf:

*.* @IP_ADDRESS_OF_LOG_SERVER

Now restart syslog:

service syslog restart

Log Rotation

Finally, ensure that the new log files get rotated when other syslog files are rotated by modifying /etc/logrotate.d/syslog, modify the top line so it looks like this:

/var/log/messages /var/log/secure /var/log/maillog 
/var/log/spooler /var/log/boot.log /var/log/cron 
/var/log/httpd_access_log /var/log/httpd_error_log {

Git and Github in the workplace

Posted 18/05/2010 20:05

Today we started using Git. We are using Github to host our repositories for a private project. Previously we have been an SVN shop and have built various deployment systems around it.

I have encountered the following issues, and will separate them into two sections; those relating to Git as a tool, and those relating to Github as a service.

Github Issues

User management and access control

As a Systems Administrator and Developer I spend my time both writing applications and maintaining services for other developers. One of these services is version control. In the past we used SVN and setup a master password list containing users for each developer. It wasn't perfect, but because it was hosted internally, the only way to access it from outside was using a VPN that authenticated against the master Active Directory server.

Git as a tool seems to provide good user management and access control by utilising SSH, where each user can have either a separate system account, or login to a shared account using an SSH key.

In the event that an employee left and needed their account revoked, simply removing that user or their key would suffice. In addition because it would be hosted internally, any access would be done through the VPN, and presumably their AD account would also be shut down.

However with Github, which encourages 'social coding', the access control and user management is severely limited. Firstly, each user has their own Github account that is in no way related to the company's project account. They are then invited to join a company repository as a 'collaborator'. Unfortunately this is done on a per-repository basis, meaning that removal of a user must be done manually for each.

No prizes for guessing for who is going to get lumbered with that task!

Archive Support

Github have disabled archive support in Git, meaning that you cannot export a tagged release for packaging. Instead you have to clone or pull a repo, and then package it. This is inefficient as it also pulls down all commit changes, which are not necessary when packaging software.

Git Issues

The first issue relates to the previous issue. This is extracting code from a repo for packaging without also downloading all the commit changes.

There is no equivalent of svn export, and whilst archive appears to be what I need, Github does not support this. I don't know if this is a problem with Git or Github, either way it is annoying.

The other issue is complexity. Granted I am new to all this distributed version control stuff, so will stick with it. But when I started using SVN it just clicked and immediately made my life easier. I'm not sure when I will be needing Git's advanced features, but currently Git itself doesn't seem to offer any benefits to my work flow (although Github's code review is useful).

Conclusion

So all in all, not too bad of a start with Git. Although at the moment I am not using the full power of Git (i.e. forking/merging) as I have absolutely no need for it and would prefer to spend my time coding than messing with version control.

I am currently just pushing back to the master repository at Github, and although this isn't very 'cool', it suits my needs well (as SVN did).

My message to those considering switching to Git would be: If you are having trouble with conflicting commits or are looking to use a hosted service like Github that gives you added extras (like code reviews) then go for it.

If on the other hand you are happy with SVN and its limitations, then stick with it, and if you are looking for the added benefits of a hosted service, try something like Unfuddle.

Threading Vs. Forking and PHP

Posted 01/03/2010 19:13

Found two excellent articles today on Threading Vs. Forking, and using the forking extension in PHP (pcntl).

Suffice to say, being a Linux user, I prefer using forking rather than threading. I have had experience with threading in Java. I guess I just enjoy the safety net that forking gives you. I also prefer that fact that you have to have a defined protocol between two processes to achieve IPC (inter process communication). Whereas in threading each thread can access shared memory directly.

StartCom Free SSL Certificates

Posted 01/03/2010 18:58

Yesterday I started using StartCom's free SSL certificate authority to replace some self-signed certificates I had previously been using on my web sites.

StartCom are, as far as I am aware, the only company who provide free 'proper' SSL certificates (I.E. ones that have their Certificate Authority certificate embedded in the majority of web browsers).

The process was fairly painless, you first have to generate an SSL certificate for your web browser. This allows you access to their control panel (no usernames and passwords).

Then to get an SSL certificate for a domain name, you have to validate that you own that domain (by sending an email to postmaster@domain.com). This took a long time to complete (5 minutes or so), so I thought it had crashed on the first few attempts.

Eventually though it completed, and after that it was a breeze.

As with anything to do with SSL, having a basic understanding about what all these keys and certificate files actually do is essential, otherwise you will quickly get confused.

PHP UK 2010 Conference

Posted 28/02/2010 18:49

On Friday Feb 25 2010 I visited the PHP UK 2010 Conference in London with two of my colleagues from work.

This was my first PHP conference, and I enjoyed the presentations. Probably the most useful to me was the talk given by Sticky Eyes on optimising MySQL and Message Queues for a high traffic SEO agency.

This talk mentioned Beanstalkd, which is an open source message queue. I remember looking at this application last year, however it did not have persistence at that time. Now it does, so am seriously considering implementing it at work.

My other favourite talk was given by IBuildings on implementing Web Services. I am a supporter of RESTful APIs as they appeal to me as a simple and pragmatic way to expose services to other processes, without the complexity that SOAP brings.

The lunch supplied wasn't great, but the free beers at the end of the day provided by Facebook were much appreciated!

Cheat Sheets

Posted 25/10/2009 23:30

Today I found some very useful security and networking cheat sheets:

VPS.NET CentOS and NginX Load Balanced Cloud Cluster

Posted 24/10/2009 11:18

This week I have been experimenting with the cloud computing provider, VPS.NET.

The application I am trying to scale is a custom built PHP/MySQL web logging application, so unlike many web apps it has more database writes than reads. This is one of the challenges of scaling it, as a central database will be a single point of failure and a bottleneck.

About VPS.NET

The cheapest Virtual Machines at VPS.NET are £15 a month for 400MHZ CPU, 256MB RAM, 10GB disk space and 250GB/month of bandwidth.

VPS.NET have an interesting approach to choosing the power of a virtual machines. Instead of upgrading (and paying for) individual virtual machines, instead you purchase 'nodes' of resources, which you can allocate to 1 or more virtual machines. This gives a lot of flexibility as you can move 'nodes' around between virtual machines within minutes, without having to change your billing amount.

Effectively this means that the virtual machines themselves are free, but the resources to run them are what you must pay for.

Scaling the Web App

I use a traditional load balancing layout of 1 load balancer (with 1 passive failover) and multiple application VPSs.

The load balancer in question is NingX, a very high performance web server and reverse proxy.

Each application VPS was allocated an internal IP address from VPS.NET so that internal traffic would not be counted towards the bandwidth quota. Then Apache, PHP and MySQL were installed onto each application VPS. The database writes are scaled by having each application VPS write to its own local database, then logging data is then merged together in a central reporting suite later.

CentOS Configuration

CentOS is a free community rebuild of the popular RedHat Enterprise Linux distribution.

The only tuning I made to the CentOS basic image provided by VPS.NET was to turn off the firewall. Obviously this is not very secure, but I intend to re-enable it later, however during the testing phase it was causing me problems as the sheer number of TCP connections was causing the iptables connection tracking table to fill up and drop packets.

service iptables stop
chkconfig iptables off

NingX Load Balancer Configuration

I used the latest stable NginX release and also added in the Fair Upstream add-on module.

This is the NginX configuration:

upstream backend  {
        fair;
	server ip1:80   max_fails=20 fail_timeout=10s;
        server ip2:80	max_fails=20 fail_timeout=10s;
        server ip3:80   max_fails=20 fail_timeout=10s;
        server ip4:80	max_fails=20 fail_timeout=10s;
}

server {
        listen          80 default;
        server_name     default;

        proxy_set_header        Host            $host;
        proxy_set_header        X-Real-IP	$remote_addr;
        proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_connect_timeout           5;
        proxy_send_timeout              10;
        proxy_read_timeout              15;

        location / {
                proxy_pass              http://backend;
                proxy_cache             off;
        }
}

Application VPS Configuration

Each application VPS runs Apache and MySQL using 1 node of VPS.NET power.

Admittedly this is probably not how I would run it in production, but I wanted to see just how much I could get out of 256MB of RAM!

Apache was tuned so that each instance would not serve more concurrent requests than it could realistically handle, this way the VPS would not get overloaded and slow to a crawl.

Apache's prefork settings:

StartServers	  10
MinSpareServers   10
MaxSpareServers   20
ServerLimit	 256
MaxClients	 30
MaxRequestsPerChild  4000

I also installed the APC PHP accelerator to get better PHP performance.

Siege Test Results

I used an HTTP benchmarking tool called Siege to repeatedly request the web application over the Internet (not on the same network as the VPSs).

Here are the results:

** SIEGE 2.69
** Preparing 500 concurrent users for battle.
The server is now under siege..      done.
Transactions:                      100000 hits
Availability:                      100.00 %
Elapsed time:                       99.71 secs
Data transferred:               80.20 MB
Response time:                        0.42 secs
Transaction rate:             *1002.91 trans/sec*
Throughput:                        0.80 MB/sec (6.4Mbits/sec)
Concurrency:                      *422.61*
Successful transactions:      100000
Failed transactions:                   0
Longest transaction:               21.01
Shortest transaction:                0.00

Conclusion

I was able to achieve 950-1000 transactions a second using this set-up, although the VPS cluster was being pushed to its limits. During the benchmark the NginX load balancer logged many connection errors to the application VPS (as they're MaxClients was set low), however the load on each application VPS did not exceed 4, and did not start swapping to disk.

By configuring NginX with a short connect and read time out, it was able to redirect the request to another application VPS so that the client would not notice.

The beauty of this system is that adding additional application VPS nodes is simple with VPS.NET and so the cluster can be scaled with no downtime.

VPN vs Remote Desktop - Avoiding Split Tunneling

Posted 28/09/2009 21:06

I have been thinking about the pros and cons of implementing remote access using VPN (such as OpenVPN) vs. an application level remote access such as SSH or Remote Desktop.

Some of the arguments I have seen made for using Remote Desktop over a VPN is that any viruses or malicious software running on the connecting user's computer cannot directly affect the services running inside the corporate network.

Another argument for Remote Desktop is that unlike VPN, the user's computer network is not directly connected to the corporate network, so any malicious traffic coming from the Internet cannot make its way into the remote network - so called 'Split Tunnelling'

Split tunnelling is when a VPN connection is established on a user's computer, but not all of their network traffic is forwarded down the tunnel to the corporate network's gateway. Instead the VPN is configured only to send traffic for the subnets that belong to the corporate network down the VPN tunnel.

This is much more efficient, as normal Internet browsing still goes out of the end user's connection as normal. However it is arguably opening up a security hole because it could allow packets to be routed from the Internet directly into the internal corporate network via the end user's VPN tunnel.

Most VPN software has the ability to force all traffic down the VPN tunnel which prevents traffic from the external Internet being routed down the VPN accidentally. However this feature can be turned off by malicious clients, and there is always the possibility of clients enabling NAT on their computer to forge external traffic to appear to come from their VPN IP address.

So does mean we shouldn't use VPNs?

I would argue not, whilst accidental split tunnelling could cause problems, it can be mitigated by enabling the features in the VPN software to stop this and by configuring a firewall on the VPN terminator to ensure that traffic only comes down the VPN tunnel from the correct IP addresses (no external IP ranges!).

Any malicious user you are allowing to connect to your servers, either by VPN or by Remote Desktop is likely to be able to cause harm.

Even if they cannot directly route traffic from outside into the VPN, they may still be able to steal sensitive data from the internal network using simple 'copy and paste' technique over Remote Desktop.

Jon Reed Web Development Blog

Posted 27/09/2009 16:22

My good friend Jon Reed has just started a web development blog.

I am looking forward to a lot of cutting edge web development tips in the coming months!

Back from Hurghada, Egypt - Into the thick of it!

Posted 27/09/2009 16:19

Yesterday morning at 2AM I got back from my holiday in Egypt.

I had a great time, although feeling a little bit sick now, guessing its the water out there.

Whilst I was there, my friend and I visited the Hed Kandi Beach Bar. What a spot, an open air club, right on the beach, playing House music. Perfect!

Just before I left the apartment where I was staying I received a series of SMS messages from my monitoring system alerting me that the ISP that I use was experiencing connectivity difficulties.

A quick call confirmed that the issues were being worked on, so there wasn't much more I could do, whether I was in Egypt or the UK.

Six hours later I arrived back home, after quickly checking my E-Mails to ensure all was OK (it was) I decided to get some sleep. Just as I clicked to shut down my computer, my phone went off with lots of SMS messages.

This time it was the other ISP we use at work having connectivity troubles. Talk about a Welcome Back Present!