assembly - PIC32 speed : Optimizing c code -


i want suggestions optimize code simple 1 need fast , fast mean less 250 ns.
first code slow , 1000 ns after works 550 ns believe can done faster don't know how :<
using pic32 80 mhz system clock
code:

void main() {     unsigned long int arr_1[4095];      unsigned long int arr_2[4095];       //here assign arr_1 , arr_2 values     //...     //...      trisc = 0;     trisd = 0;      while(1){          latc = arr_1[porte];          latd = arr_2[porte];     }  } 

as can see simple job, problem speed.
saw assembly listing see how many instructions there , don't know assembly language optimize it.

;main.c, 14 ::      latc = arr_1[porte]; 0x9d000064  0x27a30000  addiu   r3, sp, 0 0x9d000068  0x3c1ebf88  lui r30, 49032 0x9d00006c  0x8fc26110  lw  r2, 24848(r30) 0x9d000070  0x00021080  sll r2, r2, 2 0x9d000074  0x00621021  addu    r2, r3, r2 0x9d000078  0x8c420000  lw  r2, 0(r2) 0x9d00007c  0x3c1ebf88  lui r30, 49032 0x9d000080  0xafc260a0  sw  r2, 24736(r30) ;main.c, 15 ::      latd = arr_2[porte]; 0x9d000084  0x27a33ffc  addiu   r3, sp, 16380 0x9d000088  0x3c1ebf88  lui r30, 49032 0x9d00008c  0x8fc26110  lw  r2, 24848(r30) 0x9d000090  0x00021080  sll r2, r2, 2 0x9d000094  0x00621021  addu    r2, r3, r2 0x9d000098  0x8c420000  lw  r2, 0(r2) 0x9d00009c  0x3c1ebf88  lui r30, 49032 ;main.c, 16 ::      } 0x9d0000a0  0x0b400019  j   l_main0 0x9d0000a4  0xafc260e0  sw  r2, 24800(r30)   

any suggestions optimize code ?

edit:
*porte, latc , latd i/o mapped registers *the goal of code change latc , latd registers fast possible when porte changed(so porte input , latc , latd output), output depend on value of porte

a potential limiting factor since porte, latc , latd not regular memory rather i/o registers, possible i/o bus speed lower memory bus speed , processor inserts wait-states between accesses. may or may not case pic32, general point need consider architecture.

if i/o bus not limitation first of have applied compiler optimisations? such micro-optimisations best bet. code seems trivially optimised, assembler not appear reflect (although no mips assembler expert - compiler optimiser however).

since i/o registers volatile optimiser may defeated @ optimising loop body significantly. since volatile, code unsafe, since possible (and indeed likely) porte change value between assignment of latc , latd may not intention or desirable. if case code should changes follows:

int porte_value_latch = 0 ; for(;;) {      // non-volatile copy of porte.      porte_value_latch = porte ;         // write latc/d consistent porte value       // won't change between assignments, , not need       // read memory or i/o.      latc = arr_1[porte_value_latch] ;      latd = arr_2[porte_value_latch] ; } 

which both safe , potentially faster since volatile porte read once, , porte_value_latch value can retained in temporary register both array accesses rather read memory each time. optimiser optimise register access if regular compilation not.

the use of for(;;) rather while(1) makes little difference, compilers issue warning invariant while expressions, bit accept for(;;) idiom quietly. have not included code assembler line 13 not possible determine compiler generated.

a further possibility optimisation may available if latc , latd located in adjacent addresses, in case might use single array of type unsigned long long int in order write both locations in single assignment. of course 64 bit access still non-atomic, compiler may generate more efficient code in case. neatly avoids need porte_value_latch variable there 1 reference porte. if latcand latd must written in specific order, loose level of control. loop like:

for(;;) {     latcd = arr_1_2[porte] ; } 

where address of latcd low-order address of adjacent latc , latd registers, , has type unsigned long long int . if latc has lower address then:

unsigned long long int latcd = (unsigned long long int)latc ; 

so writing latcd writes both latc , latd. toy have combine arr_1 arr_2 single array of unsigned long long appropriate word-order contains both c , d values in single value.

another suggestion: configure hardware read porte single location using dma triggered clock signal @ >=4mhz. loop not need read porte @ rather read dma memory location may or may not faster. set dma write latc/latd memory location loop performs no i/o @ all. method allow "adjacent memory" method work if latc , latd not adjacent.

ultimately if issue down compiler's code generation, implementing loop in in-line assembler , hand optimising may make sense.


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -