Subversion Repositories Open64

[/] - Rev 3751

Rev

Go to most recent revision

Filtering Options

Clear current filter

Rev Log message Author Age Path
3751 bb.ld and common.ld are SL specific yug 594d 08h /
3750 Fixed bug 675 - IPA generates incorrect line/file table information:

IPA currently line number information in WHIRL statements for code
being inlined is being replaced with the line number information
associated with the call (see note * below).

The fix is to have IPA generate a global file table along with maps
for each input WHIRL file that maps an index into the input file table
to an index into the global file table.

Then during processing done IP_READ_fix_tree the file index field in
the SRCPOS record in each WHIRL statement node is updated to use the
appropriate global file table index.

In output_queue::flush() the current state of the global file table is
written to the current .I file. Note that the processing to do this
is done by a newly added routine copy_DST_Type(). Also the routine
merge_directories_and_files(), which AFAICS has always generated bogus
file tables, is no longer called and has been removed.

Reviewed and approved by Gautam.

Note *: A workaround in IPO_INLINE::Process_Op_Code was added with the
PSC 1.3 merge which prevents assembly errors when out-of-ranges file
indices were detected in the .loc assembly directives (if you take out
this work around in previous version of the compiler, the IPA compile
of SPEC xalancbmk will likely fail). This change removes this
workaround.
dgilmore 601d 21h /
3749 Fix [Bug 872] ZDL-- Error loop body
C.R. by YeMei
shenruifen 604d 14h /
3748 Add Stephen Clarke to MAINTAINERS stevec 604d 20h /
3747 Fix bug #876.

This patch is related to bug 827 and rev 3739. For a simple case:
extern void undef();
static void foo() {
undef(); // <-- A
}
static void bar() {
int n;
switch(n) {
foo(); // <-- B
}
}
int global() {
bar();
}

The callsite B is dead code and eliminated by the GCC FE at O2. With the
patch to bug 827, function foo is still marked as referenced. Then
function foo() is emitted to the BE and generate the call to "undef".
There will be an error in the final linking phase: undefined reference
to `undef()'. In the previous patch rev 3739, when we generate the CALL node to a
function, both the name and decl of the function are marked as
referenced. That causes this regression. Since the logic to detect if a
static function is used but not defined only check the name of the
symbol, the new patch only set the name of the function referenced.

Code Review by Sun Chan.
laijx 605d 09h /
3746 Fix bug877.
Fix compiler failure when program does not end with a new line character.

Code Review: Sun-Chan.
zhuqing 605d 10h /
3745 Bug 828 - Output faiulre in g++ regression suite in SVN3663 , Author: Stephen Clarke <stephen.clarke@st.com> , Reviewed by : Jian-Xin Lai codestr0m 608d 08h /
3744 Last open64 merge (2 commits below) was 3669:3741.

The commit log said it was 3640:3669, but that's a mistake. We merged in the
latest commits - r3669:3741 (merge [10] in README.openuh).
dreachem 609d 03h /
3743 Updates to libopenmp tasking implementation

This includes a number of improvements to the tasking implementation in
libopenmp:

- Changes to lessen task queue contention when stealing a task from a
"victim" queue. Victim selection is now uniformly distributed, and if a
victim queue is chosen that is being currently accessed by another thread,
then another queue will be selected.

- Adds a new public/private task pool configuration.

- Adds lockless queue implementation which uses gcc builtin for
compare-and-swap. Implementation for __ompc_mfence comes from Habenero C
runtime. Note: does not currently resize queues when they fill up.

- By default, each thread will now have a dedicated queue for tied and a queue
for untied tasks in the task pool. Using two queues per thread allows for
better adherance to task scheduling constraints from OpenMP spec.

- More configurable task cutoff configuration (below)

- Reorganizing task pool files (below)

More configurable task cutoff
------------------------------

Also, this adds more configurable task cutoff schemes to runtime. Task
cutoffs may be specified in a more flexible way. The environment variable
O64_OMP_TASK_CUTOFF is used to describe the cutoff scheme. The format is:

cutoff:val[,cutoff:val,[...]]

where cutoff may any of {always, never, num_threads, switch, depth,
num_children}, and val is used to either disable the corresponding cutoff
(with 0) or specify a value for its limit. If no value is given, then the
default limit value for the cutoff is used.

Description of cutoffs:

always: always cut off explicit tasks (i.e. explicit tasks never get created)
never: never cutoff of explicit tasks (i.e. explicit tasks always get created)
num_threads: cut off task generation if team size is less than specified value
switch: cut off task generation if "switching depth" reaches the specified value
depth: cutoff task generation if current depth in task tree reaches the specified value
num_children: cutoff task generation if current tasks num_children reaches the specified value

Defaults:
num_threads: is enabled and limit is set to 2 (i.e. team size must be 2 or greater)
switch: is enabled and limit is set to 100
depth: is disabled (default limit is 100)
num_children: is disabled (default limit is 100)

Example:
O64_OMP_TASK_CUTOFF=num_threads:2,switch:1000,depth:700,num_children:50

This means that the runtime will cut off task creation if the team's number
of threads is less than 2, if the switching depth reaches 1000, if depth
reaches 700, or if num_children reaches 50.

Reorganizing task pool files
----------------------------

In order to reduce clutter in libopenmp, this moves non-default task pool
implementations into the subdirectory other_taskpools. The default task pools
(formerly named per_thread2) is now in omp_task_pool.c. The task pools have
been renamed:

old new
--- ---
per_thread2 default
per_thread1 simple
global simple_2level

Contributors: Deepak, Jim, Priyanka, Yonghong
dreachem 609d 03h /
3742 open64 merge r3640:3669

Merges a number of fixes/enhancements over a 2 month period from Open64 main
trunk into OpenUH. Enhancements include IF-statement vectorization framework,
CG updates for AMD Bulldozer, enabling more if-conversion in WOPT, option to
disable shared library support for improved portability (e.g. to port to
Cygwin/Windows), VCG graph support for procedure CFGs in CG, a
'copyin-copyout" optimization for structure members whose accesses exhibit
poor cache locality, new DSP Zero-Delay-Loop (ZDL) implementation. Fixes for
sqrt intrinsic, memory leaks, CODEREP:Set_dtyp_const_val(dt,v), superfluous
region exit blocks, EBO mul operation, integer division simplification,
volatile fields, removing obsolete KEY preprocessor and PURPLE feature, WOPT
seg fault in CFG phase and more also included.

See the open64 (main trunk) commit log for more details.
dreachem 609d 04h /
3741 Fix bug792, support constraint "b" used in inline asm on IA-64.

Code Review:Jian-xin.
zhuqing 609d 10h /
3740 fix bug 875. itanium dbg build err due to insufficient option space.
Code Review: Lai Jian-xin.
yug 610d 08h /
3739 Fix bug #827.
he error message should be reported at toplev.c, line 884:
874 if (TREE_CODE (decl) == FUNCTION_DECL
875 && DECL_INITIAL (decl) == 0
876 && DECL_EXTERNAL (decl)
877 && ! DECL_ARTIFICIAL (decl)
878 && ! TREE_NO_WARNING (decl)
879 && ! TREE_PUBLIC (decl)
880 && (warn_unused_function
881 || TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME
(decl))))
882 {
883 if (TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl)))
884 pedwarn ("%q+F used but never defined", decl);
885 else
886 warning (0, "%q+F declared %<static%> but never
defined", decl);
887 /* This symbol is effectively an "extern" declaration now.
*/
888 TREE_PUBLIC (decl) = 1;
889 assemble_external (decl);
890 }


Because the TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl)) is
not set for the callee in this case, the compiler doesn't report any
error on this case. The patch is to set this flag when handle the
CALL_EXPR when the tree is translated into spin.

Code review by Gautam.
laijx 612d 11h /
3738 fixed bug 865
In IA64, open64 need ftz.o to lin with -Ofast optimization , this patch build and install ftz.o
ycwu 612d 13h /
3737 Fix bug847.
1. Put global asm code at where it defines
2. When there is global asm in source file, do not emit .org

Code Review: Jian-Xin.
zhuqing 617d 09h /
3736 fixed typo error in revision 3735
code review by Lia Jian xin
ycwu 619d 10h /
3735 A new implementation for the open64's support for DSP Zero-Delay-Loop(ZDL) feature. This ZDL contribution is a co-operational approach of global scalar optimizer WOPT and the Code Generation component CG. The common ZDL features, i.e, the loop counting mechanism hidden and bottom loop condition abstracted are implemented in WOPT with minimized changes while keeping the maximum optimizing result. A pseudo branch op method is introduced to expand ZDL WHIRL operator and delay the ZDL hardware instruction generation to the very last phase of CG loop optimization. This lazy implementation helps keeping the compatibility with other CG loop optimizations to produce the most optimal code. With the new ZDL implementation, open64 provides the equal expressibility to gcc's doloop_begin, doloop_end and decrement_and_branch_until_zero patterns.

Pragma:
#pragma zdl off

Flags:
WOPT_Enable_ZDL
OPT_Lower_ZDL
WOPT_Enable_ZDL_Early_Exit
WOPT_ZDL_Innermost_Only
CG_zdl_enabled_level[SL]

retargeting macro:
ZDL_TARG

retargeting interface:
INT CG_LOOP_ZDL_Gen(LOOP_DESCR*); //cg_loop.cxx
void Emit_Phase_Validity_Check(void); //cgemit.cxx
void Target_Specific_ZDLBR_Expansion(TN* target_tn); //whirl2ops.cxx

provided options
-WOPT:zdl
-WOPT:zdl_skip_a
-WOPT:zdl_skip_b
-WOPT:zdl_skip_e
-OPT:lower_zdl
-CG:zdl_enabled_level[SL]

Code Review: Fred Chow, Lai Jianxin and Sun Chan.
yug 622d 14h /
3734 fixed for bug 826
The fix is to partially undo the change for revision 3659. The change
for revision 3659 includes other fix in addition to the one for
removing 'structure copies'. This fix undoes only the part for
optimization of removing structure copies.

Code review by Lai Jianxin
ycwu 623d 12h /
3733 Fix bug851. Add option "TENV:mregparm=%d" to be when use "mregparm=%d".

Code Review: David Coakley
zhuqing 623d 15h /
3732 Fixed a problem in load folding optimization where the memory location used
in the load operation is overwritten by a store operation. A check is added
to prevent folding of such load operations.

CR by Jian-Xin
shivaram 624d 11h /

1 2 Next >

Show All