Hi,
Thank you, Hippo. And now, I'm trying to use swap file. First of all, Wind River didn't implement swap function. Moreover they didn't use 'vmalloc' at all. This means that they left us the last wonderland of Nios MMU (Bugs:D).
Unfortunately, swapon system call uses 'vmalloc', we must repair its bug.
In the file '/home/***/nios2-linux/linux-2.6/arch/nios2/mm/fault.c', the codes
vmalloc_fault:
{
/*
* Synchronize this task's top level page-table
* with the 'reference' page table.
*
* Do _not_ use "tsk" here. We might be inside
* an interrupt in the middle of a task switch..
*/# define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))# define __pgd_offset(address) pgd_index(address)
int offset = __pgd_offset(address);
pgd_t *pgd, *pgd_k;
pud_t *pud, *pud_k;
pmd_t *pmd, *pmd_k;
pte_t *pte_k;
# if 1
/* FIXME: Is this entierly correct ?
*/
pgd = (pgd_t *) pgd_current + offset;# else
pgd = ¤t->mm->pgd;# endif
pgd_k = init_mm.pgd + offset;
if (!pgd_present(*pgd_k))
goto no_context;
set_pgd(pgd, *pgd_k);
pud = pud_offset(pgd, address);
pud_k = pud_offset(pgd_k, address);
if (!pud_present(*pud_k))
goto no_context;
pmd = pmd_offset(pud, address);
pmd_k = pmd_offset(pud_k, address);
if (!pmd_present(*pmd_k))
goto no_context;
set_pmd(pmd, *pmd_k);
pte_k = pte_offset_kernel(pmd_k, address);
if (!pte_present(*pte_k)) {
goto no_context;
}
flush_tlb_one(address); // <-- Add this to flush the contents of TLB.
return;
}
copies (shares) the page tables, but they forgot to flush the old contents of TLB. Without this, Nios CPU will use the old contents of TLB (that generates the page fault) forever, and the kernel will freeze.
About the swap functionality, we must implement next functions in the file '/home/***/nios2-linux/linux-2.6/arch/nios2/mm/pgtable.c'.
/* Swap not implemented
*/
swp_entry_t __pte_to_swp_entry(pte_t pte){BUG();}
pte_t __swp_entry_to_pte(swp_entry_t swp){BUG();}
unsigned long __swp_type(swp_entry_t swp){BUG();}
pgoff_t __swp_offset(swp_entry_t swp){BUG();}
swp_entry_t __swp_entry(unsigned long type, pgoff_t offset){BUG();}
To swap a page, the kernel must remember where the page is stored. But it isn't reasonable to request a new memory region in the case of memory shortage, Linux reuses the page table entry to remember two parameters. The parameter 'type' is used to distinguish the swap files, and the 'offset' is used to indicate the page location (offset) from the head of swap file. The function '__swp_entry' makes a new content of page entry for the page that will be swapped out. Of course, this fake page entry is loaded to the Nios TLB, and must generate page fault. To do this, normal CPU's MMU has a present bit, but the MMU of Nios is a little bit strange.
tlbacc Control Register Fields
|31 .....25 |24|23|22|21|20| 19 ..... 0|
| ....IG..... |C | R|W| X |G |... PFN.. |
To generate page faults, we must reset R = W = X = 0. Other bits IG are used as follows (these bits will not generate hardware page faults).
/* TLBACC also has 7 IGNORE bits to use for SW defined attributes
*/# define _PAGE_PRESENT (1<<5)# define _PAGE_ACCESSED (1<<6) # define _PAGE_MODIFIED (1<<7) # define _PAGE_FILE (1<<8) # define _PAGE_VALID (1<<9)# define _PAGE_OLD (1<<10)
The bit '_PAGE_FILE' is used other purpose and to enter the function 'do_swap_page',
static inline int handle_pte_fault(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address,
pte_t *pte, pmd_t *pmd, int write_access)
{
pte_t entry;
spinlock_t *ptl;
entry = *pte;
if (!pte_present(entry)) {
if (pte_none(entry)) {
if (vma->vm_ops) {
if (likely(vma->vm_ops->fault))
return do_linear_fault(mm, vma, address,
pte, pmd, write_access, entry);
}
return do_anonymous_page(mm, vma, address,
pte, pmd, write_access);
}
if (pte_file(entry))
return do_nonlinear_fault(mm, vma, address,
pte, pmd, write_access, entry);
return do_swap_page(mm, vma, address,
pte, pmd, write_access, entry);
}
we must reset _PAGE_PRESENT and _PAGE_FILE bits, but set at least one bit because the function 'pte_none' checks those bits like
/* FIXME: Today unmapped pages are mapped to the low physical addresses
* and not 0 (to avoid to trigger the false alias detection in the iss)
* Also check pte_clear.
*/
int pte_none(pte_t pte){# if 0
return (!((pte_val(pte) >> 20) & ~_PAGE_GLOBAL));# else
return (!((pte_val(pte) >> 20) & ~(_PAGE_GLOBAL|0xf)));# endif
}
.
So I designed the functions like
swp_entry_t __pte_to_swp_entry(pte_t pte){return (swp_entry_t) {pte_val(pte)};}
pte_t __swp_entry_to_pte(swp_entry_t swp){return (pte_t){swp.val};}
unsigned long __swp_type(swp_entry_t swp){return ((swp.val >> 26) & 0x3);}
pgoff_t __swp_offset(swp_entry_t swp){return (swp.val & 0xfffff);}
swp_entry_t __swp_entry(unsigned long type, pgoff_t offset){return (swp_entry_t){(_PAGE_VALID << 20) | ((type & 0x3) << 26) | (offset & 0xfffff)};}
.
This limits the amount of swap files to 4, but I think it is enough:D.
After the configuration
Kernel/Library/Defaults Selection --->
[*] Customize Kernel Settings (exit) (exit)
General setup --->
[*] Support for paging of anonymous memory (swap)
Kernel/Library/Defaults Selection --->
[*] Customize Application/Library Settings (exit) (exit)
BusyBox ---> Linux System Utilities --->
[*] swaponoff
, you can use swap files e.g. by the command
>swapon /dev/***
. Of course you must prepare some storage.
I gave some stress tests to my NEEK, and it works well. Next is a snapshot of '/proc/meminfo'.
MemTotal: 26076 kB
MemFree: 5080 kB
Buffers: 0 kB
Cached: 11980 kB
SwapCached: 88 kB
Active: 2260 kB
Inactive: 2260 kB
Active(anon): 2260 kB
Inactive(anon): 2260 kB
Active(file): 0 kB
Inactive(file): 0 kB
Unevictable: 11980 kB
Mlocked: 0 kB
SwapTotal: 351992 kB
SwapFree: 351648 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 4520 kB
Mapped: 2036 kB
Slab: 1716 kB
SReclaimable: 336 kB
SUnreclaim: 1380 kB
PageTables: 208 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 365028 kB
Committed_AS: 8052 kB
VmallocTotal: 1048575 kB
VmallocUsed: 176 kB
VmallocChunk: 1048399 kB
But Nano-X sometimes invokes 'oom-killer'. Why?
Kazu