||A new implementation for the open64's support for DSP Zero-Delay-Loop(ZDL) feature. This ZDL contribution is a co-operational approach of global scalar optimizer WOPT and the Code Generation component CG. The common ZDL features, i.e, the loop counting mechanism hidden and bottom loop condition abstracted are implemented in WOPT with minimized changes while keeping the maximum optimizing result. A pseudo branch op method is introduced to expand ZDL WHIRL operator and delay the ZDL hardware instruction generation to the very last phase of CG loop optimization. This lazy implementation helps keeping the compatibility with other CG loop optimizations to produce the most optimal code. With the new ZDL implementation, open64 provides the equal expressibility to gcc's doloop_begin, doloop_end and decrement_and_branch_until_zero patterns.
#pragma zdl off
INT CG_LOOP_ZDL_Gen(LOOP_DESCR*); //cg_loop.cxx
void Emit_Phase_Validity_Check(void); //cgemit.cxx
void Target_Specific_ZDLBR_Expansion(TN* target_tn); //whirl2ops.cxx
Code Review: Fred Chow, Lai Jianxin and Sun Chan.
||Add extended proactive loop optimizations: Hash nested if-compare expressions to identify loop fusion candidates; Apply if-condition distribution, if-condition tree height reduction and reversed loop unswitching to expose if-merging opportunities; Apply if-merging, proactive loop fusion and loop fusion; Add head/tail duplication of if-regions; Add bit-expression simplification and dead code removal; Add more utilities and debugging flags. CR: Sun Chan
||SL changes merge r3231:3575. CR by Sun Chan, David Coakley and Lai Jianxian
||Addition of -WOPT:SIB=<on|off> flag and its functionality to support
Scaled-Index-Base address mode generation. Also extended pattern
matching for add sub instructions to generation inc and dec
||Open64's always bottom-test loop form (inversioned) are not code-size saving for those while-dos and do-loops with big condition and small body. So, we design a heuristic to filter out those cases and do normal condition headed loop lowering
||Merge changes r2717-3263 from open64-booster branch to trunk.
These changes include work done for AMD's x86 Open64 4.2.4 release.
||Merge all changes through r2711 from open64-booster branch to trunk.
These changes include work done for AMD's x86 Open64 4.2.3 release.
||Merge all changes through r2321 from open64-booster branch to trunk.
These changes include work done for AMD's x86 Open64 4.2.2 release.
||Merge branches/merge08 into trunk.
Now the trunk is the latest revision for Open64 4.2 release.
The trunk now can generate code for 5 platforms:
- IA-64 (Itanium)
- CUDA (from NVIDIA)
- SL (an embedded DSP architecture, from SimpLight)
- MIPS prototype (from ICT based on input from PathScale and SimpLight
The trunk is merged with PathScale 3.2 release with a lot of enhancement and
bug fix from Tsinghua Univ., NVIDIA, SimpLight, HP and ICT.
||Replace all files in trunk with the merge branch.
Now the files in trunk should be the same as in the merge branch
||Fix the ultimate compile time bug for 403.gcc -O3 -ipa. This fix calculate the height of the IR Tree on the fly in WOPT to get accurate information to stop copy-propagation in time. We also add weight factor to prevent too much copy-propagation exploding the compile time.
||init implmentation of loop-multiversioning
||Rename kpro64 to osprey. The makefile will not work in several hours
||part of points-to summary, not yet enabled
||optimization about __attribute__((noreturn)) semantic
||The 'merge' branch is now our new trunk.
||transfer 782 and 783 from trunk. the r783 is fix to the bug in r782
||Check in the merge of source code pathscale-3.0 developed, which including at least:
Prefetch Invariant (non-constant) Stride
Prefetch_stride_ahead & Prefetch_invariant_stride
Enable_compose_bits / Enable_extract_bits
perform floating-point memory copies using integer registers
Use alternate malloc algorithm
WOPT_Enable_Str_Red_Use_Context = TRUE; /* use loop content in SR decision */
You can refer to pathscale's release notes for more details.
||Check in the new merged files, added the Shuxin's latest checkins.
||Remove directory merge/trunk, added by mistake.