Hartley and Zisserman describe the homography calculation (or estimation) very well in their wonderful book "Multiple View Geometry in Computer Vision". However, they suggest solving the equation system by using a singular value decomposition (SVD). Another possible solution is to use just a QR decomposition, see e.g. wikipedia for details. QR decomposition is numerically stable and likely even faster than SVD. Therefore, and to learn a bit of raw LAPACK, I've implemented a homography calculation from four points into four points using QR factorization. In case of just four points used, my implementation appears to be more than twice as fast as cvFindHomography (which uses SVD, iirc), though this might be a bit of an unfair comparison (because it's built for more than four points, e.g. by using RANSAC which might add already some overhead).

Nevertheless, the code is here. I've used single precision, but changing that to double precision should be trivial. The error is extremely small, comparable to the one OpenCV does, although this depends on the LAPACK implementation chosen.