Just to update you, Dave. The problem was indeed that the unused pins were set to "as outputs driving ground". When I changed it to "tristate inputs with weak pull up" the voltage went back to 3.3V.
I have the data transferring now and I've also modified the code to use the 8 MHz global clock instead of relying on SCK, as it was before in my old design.
begin
sync1: process(CLK)
variable resync: std_logic_vector(1 to 3);
begin
if rising_edge(CLK) then
rise <= resync(2) and not resync(3);
fall <= resync(3) and not resync(2);
resync := SCK & resync(1 to 2);
end if;
end process;
process (CLK,nCS)
begin
if (nCS='1') then
tmp <= PI;
elsif rising_edge(CLK) then
if(fall = '1') then
tmp <= tmp(PI'high -1 downto PI'low) & '0';
end if;
end if;
end process;
SO <= tmp(PI'high) when nCS = '0' else 'Z';
Sorry about the non-indentation. The forum seems to be messing up my tabs.