Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
13 years ago

Can I upload Nios2 SMP system?

Hi, guys & Altera corp.

I'm now making a Nios2 SMP system for my research purpose. It's still a little bit buggy and slow, but I succeeded to boot Linux kernel and execute bash.


Linux version 2.6.30 (hamada@Messiah2) (gcc version 4.1.2 (Wind River Linux Sour 
cery G++ 4.1-176))# 1915 SMP Tue Sep 4 18:16:32 JST 2012 
console  enabled 
Early printk initialized 
 
 
Linux/Nios II-MMU 
Altera Nios II-MMU support (C) 2004 Wind River Systems. 
init_bootmem_node(?,0x3d0, 0x0, 0x8000) 
free_bootmem(0x3d0000, 0x7c30000) 
reserve_bootmem(0x3d0000, 0x1000) 
Detected 1 available secondary CPU(s) 
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512 
Kernel command line: kgdboc=ttyS0, 115200 kgdbwait 
NR_IRQS:32 
PID hash table entries: 512 (order: 9, 2048 bytes) 
Console: colour dummy device 80x25 
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) 
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) 
We have 32768 pages of RAM 
Memory available: 125824k/3902k RAM, 0k/0k ROM (1457k kernel code, 2445k data) 
Calibrating delay loop... 19.55 BogoMIPS (lpj=97792) 
Mount-cache hash table entries: 512 
CPU1: Booted secondary processor 
Calibrating delay loop... 19.96 BogoMIPS (lpj=99840) 
Brought up 2 CPUs 
SMP: Total of 2 processors activated (39.52 BogoMIPS). 
init_BSP(): registering device resources 
bio: create slab <bio-0> at 0 
msgmni has been set to 246 
io scheduler noop registered 
io scheduler anticipatory registered 
io scheduler deadline registered 
io scheduler cfq registered (default) 
ttyJ0 at MMIO 0xa60a440 (irq = 2) is a Altera JTAG UART 
console handover: boot  -> real  
ttyS0 at MMIO 0x8000060 (irq = 3) is a Altera UART 
ifconfig: socket: Function not implemented 
ifconfig: socket: Function not implemented 
Welcome to 
          ____ _  _ 
         /  __| ||_| 
    _   _| |  | | _ ____  _   _  _  _ 
   | | | | |  | || |  _ \| | | |\ \/ / 
   | |_| | |__| || | | | | |_| |/    \ 
   |  ___\____|_||_|_| |_|\____|\_/\_/ 
   | | 
   |_| 
 
For further information check: 
http://www.uclinux.org/ 
 
Why came here? CPU0, task inetd pte c71f4c40, entry 07a0704b, address 2ab10000 
 
 
BusyBox v1.14.2 (2012-06-26 16:39:29 JST) hush - the humble shell 
Enter 'help' for a list of built-in commands. 
 
/#  ls 
bin   etc   init  mnt   root  sys   usr 
dev   home  lib   proc  sbin  tmp   var 
/#  bash #  ls -lp 
drwxr-xr-x    2 root     root            0 Sep  4  2012 bin/ 
drwxr-xr-x    6 root     root            0 Sep  4  2012 dev/ 
drwxr-xr-x    5 root     root            0 Sep  4  2012 etc/ 
drwxr-xr-x    3 root     root            0 Sep  4  2012 home/ 
lrwxrwxrwx    1 root     root           10 Sep  4  2012 init -> /sbin/init 
drwxr-xr-x    3 root     root            0 Sep  4  2012 lib/ 
drwxr-xr-x    2 root     root            0 Sep  4  2012 mnt/ 
dr-xr-xr-x   34 root     root            0 Nov 30 00:00 proc/ 
drwxr-xr-x    2 root     root            0 Sep  4  2012 root/ 
lrwxrwxrwx    1 root     root            3 Sep  4  2012 sbin -> bin/ 
drwxr-xr-x   11 root     root            0 Nov 30 00:00 sys/ 
drwxr-xr-x    2 root     root            0 Nov 30 00:01 tmp/ 
drwxr-xr-x    5 root     root            0 Sep  4  2012 usr/ 
drwxr-xr-x    7 root     root            0 Nov 30 00:01 var/ #  cat /proc/cpuinfo 
CPU:         NIOS2 MultiCore 
MMU:            ways:16 entries:512 
FPU:            none 
Clocking:       <not supported> 
BogoMips:       19.96 
Calibration:    9984000 loops 
CPU:         NIOS2 MultiCore 
MMU:            ways:16 entries:512 
FPU:            none 
Clocking:       <not supported> 
BogoMips:       19.96 
Calibration:    9984000 loops #  cat /proc/interrupts 
           CPU0 
  0:      13931     NIOS2-INTC  timer 
  2:        133     NIOS2-INTC  JTAGUART 
  3:          0     NIOS2-INTC  UART 
 30:       4875     NIOS2-INTC  IPI 0 
 31:      17375     NIOS2-INTC  IPI 1 #  cat /proc/stat 
cpu  124 0 27259 1939 0 0 2 0 0 
cpu0 54 0 13448 1183 0 0 2 0 0 
cpu1 70 0 13811 756 0 0 0 0 0 
intr 38284 14687 0 163 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 515 
9 18275 
ctxt 33554 
btime 943920000 
processes 674 
procs_running 2 
procs_blocked 0 # 
The cpu core is a clone of the genuine Nios2/f core and almost all features are implemented except the details of 1st data cache.

If anyone has interest, I want to upload these to 'Altera Wiki', but it's a problem because the 'cpu' is a clone and Altera corp. has their copyright for Nios2's instruction set and its architecture. If Altera corp. kindly permit me to upload all including hardware's source codes, this is the best way. But if not so, how can we share these result?

Kazu

15 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    I succeeded to make a Nios2 SMP system by using normal Nios2/f cores.

    --- Quote End ---

    Sounds great.

    But how did you handle cache synchronization and the inter-CPU atomic operations that are necessary to do the Mutex API and the multiple Kernel-internal synchronization issues ?

    (I understand that this is close to impossible without modifying the CPU design.)

    -Michael
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi,

    --- Quote Start ---

    But how did you handle cache synchronization and the inter-CPU atomic operations that are necessary to do the Mutex API and the multiple Kernel-internal synchronization issues ?

    (I understand that this is close to impossible without modifying the CPU design.)

    --- Quote End ---

    At first, I removed the normal data cache from the Nios2/f core (select the data cache <none> option in the SOPC builder) and added my original 1st (write-through) data cache. The cache synchronization method is the same one that is used in the clone's case. For atomic memory operations, I implemented the 'swap' that is controlled as a custom instruction. Unfortunately, we can't use cache non-cache information outside of Nios2 core, so I changed the kernel memory mapping like

    
    0xc0000000-0xcfffffff  : cacheable
    0xd0000000-0xdfffffff  : non-cacheable
    0xe0000000-0xefffffff  : cacheable
    0xf0000000-0xffffffff  : non-cacheable
    
    .

    Kazu
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    I implemented the 'swap' that is controlled as a custom instruction.

    --- Quote End ---

    AFAIK, that would include doing an additional memory interface for this instruction, as the infrastructure of the NIOS design does not allow using the processor's memory interface in a custom instruction. This of course prevents allowing for a cache within the processor. I suppose doing an external cache (aka 2nd level cache) instead of using the 1st leve cache provided by Altera will slow down the CPU a lot.

    --- Quote Start ---

    Unfortunately, we can't use cache non-cache information outside of Nios2 core, so I changed the kernel memory mapping

    --- Quote End ---

    Maybe you could use the old A31-trick (A31=1 -> cache bypassed). With that you could define non-cacheable regions using the MMU target address.

    But I don't think the problem with inter-CPU atomic instructions is solvable :(.

    -Michael
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi,

    --- Quote Start ---

    AFAIK, that would include doing an additional memory interface for this instruction, as the infrastructure of the NIOS design does not allow using the processor's memory interface in a custom instruction. This of course prevents allowing for a cache within the processor. I suppose doing an external cache (aka 2nd level cache) instead of using the 1st leve cache provided by Altera will slow down the CPU a lot.

    --- Quote End ---

    To make a SMP system with normal Nios2, we must achieve next 2 points.

    1) Atomic read-write memory instruction.

    2) Coherency of 1st data caches

    For atomic memory instructions, it is a kind of the game 'Beach Flags' (in this case the amount of flags is only one and this corresponds to a locking variable). So the flag must be set in the 2nd cache or main memory, not in the 1st caches. This means that the bus lock for atomic instructions is required between 1st cache and 2nd cache, not between cpu and 1st cache, So we can achieve 1) without tampering the Altera's data cache. But for 2), there is no method to flush the aimed line by external hardware, so it's impossible to achieve it except removing the normal data cache.

    Of course, we must accept the disadvantage to add an external 1st data cache. It makes the cpu slow, but not a lot. Now to read and write between the cpu and external 1st cache, it takes 3 clocks in the case of cache-hit. But the codes are not fully occupied by 'load ' and 'store' instructions, so the bad influences are limited. (Less memory access is the major premise for RISC processors, though it is sometimes broken:D.)

    And there are some advantages to adopt the external 1st cache. We can make the caches all physically-indexed and physically-tagged type, so the 1st data cache size can be enlarged beyond 4Kbytes without synonym problems. Moreover the bus between the 1st and 2nd cache can be made original, e.g. wider bus width or simultaneously readable & writeable. I adopt 128bits bus size and the peak data rates reaches 1.6GBytes/sec(@100MHz).

    --- Quote Start ---

    Maybe you could use the old A31-trick (A31=1 -> cache bypassed). With that you could define non-cacheable regions using the MMU target address.

    --- Quote End ---

    Yes, I used A28-trick.

    --- Quote Start ---

    But I don't think the problem with inter-CPU atomic instructions is solvable :(.

    --- Quote End ---

    If it is unsolvable, the Linux never boot;).

    Kazu