Sorry, i'm not sure if I have completely understand your suggestion. Are you telling me to apply this transform only to my input data, or to all the operations that are applied in the function used in kernel?
Given the range of numbers you're working with, you can probably get
away with just a 16.16 fixed point representation. The operations go
like this:
convert a double to a fixed point number just do (but not on the kernel):
fixed = (s32)(double * 65536.0);
convert an integer to fixed:
fixed = integer<< 16;
multiplication:
result = (s32)(((s64) fixed_a * fixed_b)>> 16);
addition:
result = fixed_a + fixed_b;
etc...
Unless you have overflow or need more than 16 bits of fractional
precision, you'll have no problem with this approach.
I hope this helps,