assembly - PIC32 speed : Optimizing c code -
i want suggestions optimize code simple 1 need fast , fast mean less 250 ns.
first code slow , 1000 ns after works 550 ns believe can done faster don't know how :<
using pic32 80 mhz system clock
code:
void main() { unsigned long int arr_1[4095]; unsigned long int arr_2[4095]; //here assign arr_1 , arr_2 values //... //... trisc = 0; trisd = 0; while(1){ latc = arr_1[porte]; latd = arr_2[porte]; } }
as can see simple job, problem speed.
saw assembly listing see how many instructions there , don't know assembly language optimize it.
;main.c, 14 :: latc = arr_1[porte]; 0x9d000064 0x27a30000 addiu r3, sp, 0 0x9d000068 0x3c1ebf88 lui r30, 49032 0x9d00006c 0x8fc26110 lw r2, 24848(r30) 0x9d000070 0x00021080 sll r2, r2, 2 0x9d000074 0x00621021 addu r2, r3, r2 0x9d000078 0x8c420000 lw r2, 0(r2) 0x9d00007c 0x3c1ebf88 lui r30, 49032 0x9d000080 0xafc260a0 sw r2, 24736(r30) ;main.c, 15 :: latd = arr_2[porte]; 0x9d000084 0x27a33ffc addiu r3, sp, 16380 0x9d000088 0x3c1ebf88 lui r30, 49032 0x9d00008c 0x8fc26110 lw r2, 24848(r30) 0x9d000090 0x00021080 sll r2, r2, 2 0x9d000094 0x00621021 addu r2, r3, r2 0x9d000098 0x8c420000 lw r2, 0(r2) 0x9d00009c 0x3c1ebf88 lui r30, 49032 ;main.c, 16 :: } 0x9d0000a0 0x0b400019 j l_main0 0x9d0000a4 0xafc260e0 sw r2, 24800(r30)
any suggestions optimize code ?
edit:
*porte, latc , latd i/o mapped registers *the goal of code change latc , latd registers fast possible when porte changed(so porte input , latc , latd output), output depend on value of porte
a potential limiting factor since porte
, latc
, latd
not regular memory rather i/o registers, possible i/o bus speed lower memory bus speed , processor inserts wait-states between accesses. may or may not case pic32, general point need consider architecture.
if i/o bus not limitation first of have applied compiler optimisations? such micro-optimisations best bet. code seems trivially optimised, assembler not appear reflect (although no mips assembler expert - compiler optimiser however).
since i/o registers volatile optimiser may defeated @ optimising loop body significantly. since volatile, code unsafe, since possible (and indeed likely) porte
change value between assignment of latc
, latd
may not intention or desirable. if case code should changes follows:
int porte_value_latch = 0 ; for(;;) { // non-volatile copy of porte. porte_value_latch = porte ; // write latc/d consistent porte value // won't change between assignments, , not need // read memory or i/o. latc = arr_1[porte_value_latch] ; latd = arr_2[porte_value_latch] ; }
which both safe , potentially faster since volatile porte
read once, , porte_value_latch
value can retained in temporary register both array accesses rather read memory each time. optimiser optimise register access if regular compilation not.
the use of for(;;)
rather while(1)
makes little difference, compilers issue warning invariant while expressions, bit accept for(;;)
idiom quietly. have not included code assembler line 13 not possible determine compiler generated.
a further possibility optimisation may available if latc
, latd
located in adjacent addresses, in case might use single array of type unsigned long long int
in order write both locations in single assignment. of course 64 bit access still non-atomic, compiler may generate more efficient code in case. neatly avoids need porte_value_latch
variable there 1 reference porte
. if latc
and latd
must written in specific order, loose level of control. loop like:
for(;;) { latcd = arr_1_2[porte] ; }
where address of latcd
low-order address of adjacent latc
, latd
registers, , has type unsigned long long int
. if latc
has lower address then:
unsigned long long int latcd = (unsigned long long int)latc ;
so writing latcd writes both latc , latd. toy have combine arr_1
arr_2
single array of unsigned long long
appropriate word-order contains both c , d values in single value.
another suggestion: configure hardware read porte single location using dma triggered clock signal @ >=4mhz. loop not need read porte @ rather read dma memory location may or may not faster. set dma write latc/latd memory location loop performs no i/o @ all. method allow "adjacent memory" method work if latc , latd not adjacent.
ultimately if issue down compiler's code generation, implementing loop in in-line assembler , hand optimising may make sense.
Comments
Post a Comment