The current generation of CPUs does, indeed contain the hardware that was once packaged separately as a math co-processor. It has evolved since then with the addition of SSE instruction sets and the uniprocessor client has been using SSE for quite a few years now and is able to process four floating point operations simultaneously with the speed varying since the early days of the PIII (at about 800 MHz) right up to the speeds of the most modern CPUs that we have today. In contrast, the GPUs typically process several hundred floating point operations simultaneously.
Nevertheless, the GPUs are limited in code complexity and, as has already been said, are not capable of running all of the types of calculations needed by FAH, but the ones they can run are extremely fast.