Just some few words to close the case.
I made several tests with clocked buffers prior the output pins and the hints given by the other users.
It seems to work (at least the spikes are gone).
So I declared
signal CypressData : std_logic_vector(15 downto 0);
in the architectur header,
and replaced every place where "pinsCypressData" is used (e.g.)
when GRAY_STEP_XXX => pinsCypressData <= x"XXXX";
with
when GRAY_STEP_XXX => CypressData <= x"XXXX";
at the very end of the process section I inserted
if rising_edge(pinClk125MHz) then
pinsCypressData <= CypressData;
end if;
The amount of used blocks increased in numbers of 10 blocks for 16 Pins.
The spikes are gone.
I made a simmilar design with a binary counter instead of a gray-code stepper, and the clocked output are stable as well.
My conclusion is that result of combinational logic on the output pins are not reliable, even if the same design build with 74xxx chips would work.
Strange but true.
Thanx for your patience and your help!