Segfaults with motif and motif tools

Hello motif-zone - Forum,

I am a computer scientist recently employed in my company with the aim of porting an 3D-Motif-application written in C from SGI to LINUX. My experience with programming (languages C, C++, python, Java) is mainly graphics programming in OpenGL, OpenInventor, VTK, High-Level Gui Toolkits (wxWidgets, QT, Java AWT/Swing, Tcl/Tk). I use Opensuse and hope there are some Motif/X11-experts or maybe even people who (still?) work with XMT (Motif Tools) who can help me out:

My problem:
---------------------
The application that I need to port is about 15 years old (in some places, in the most recent about eight years). It uses third-party GUI X11 elements programmed by different authors (spinbox1.3, Tab-1.0, Xbae-4.6.1, SciPlot-1.36, Motif Tools 400, libimage) and depends on GLw (Realtime-3D OpenGL-support), Motif and all the X11 Libs.

I can successfully compile and test all third-party GUI-things under Linux with 64bit libraries (with the exception of XBae, some example fail to display correctly but run fine, no crashes) and with 32bit libraries. I can only execute the final executable with the 32bit-Build. The 64bit build crashes when opening the first window with a buffer overflow in libc.so.6 and gdb tells me that this happens in _XmtCreateChildren/_XtCreateWidget.

I figured that I will stick to the 32bit-Build to quickly get a running version of the software, especially since we do not expect much lifetime left in our old SGI.

The 32bit-BUild starts up fine and I can test the program to a limited extent. My work is stopped when segmentation faults happen, sometimes at random, sometimes reproducable. Until now, I have identified three different kinds and I would be very grateful if someone could give me hints on the nature of my problem:

First kind:
---------------
(gdb) run
Starting program:
warning: Lowest section in system-supplied DSO at 0xffffe000 is .hash at ffffe0b4

Program received signal SIGSEGV, Segmentation fault.
0xf7b41671 in _XtSortPerDisplayList () from /usr/lib/libXt.so.6
(gdb) where
#0 0xf7b41671 in _XtSortPerDisplayList () from /usr/lib/libXt.so.6
#1 0xf7b4175c in _XtGetPerDisplay () from /usr/lib/libXt.so.6
#2 0xf7b4185d in XtDisplayToApplicationContext () from /usr/lib/libXt.so.6
#3 0xf7d09c1e in XmRenderTableCopy () from /usr/lib/libXm.so.4
#4 0xf7cff80d in XmFontListCopy () from /usr/lib/libXm.so.4
#5 0x081c92a5 in Initialize (request=0xffa49470, init=0x851fb28, args=0x0, num_args=0xffa49424)
at Layout.c:544
#6 0xf7b3f4c8 in ?? () from /usr/lib/libXt.so.6
#7 0xffa49470 in ?? ()
#8 0x0851fb28 in ?? ()
#9 0x00000000 in ?? ()

Second variation:
----------------------------
(gdb) run
Starting program:
warning: Lowest section in system-supplied DSO at 0xffffe000 is .hash at ffffe0b4

Program received signal SIGSEGV, Segmentation fault.
0xf7d50c0b in XmRenderTableCopy () from /usr/lib/libXm.so.4
(gdb) where
#0 0xf7d50c0b in XmRenderTableCopy () from /usr/lib/libXm.so.4
#1 0xf7d4680d in XmFontListCopy () from /usr/lib/libXm.so.4
#2 0xf7c6b4e6 in ?? () from /usr/lib/libXm.so.4
#3 0x08397540 in ?? ()
#4 0x00000000 in ?? ()

Third variety:
----------------------
Starting program:
warning: Lowest section in system-supplied DSO at 0xffffe000 is .hash at ffffe0b4

Program received signal SIGSEGV, Segmentation fault.
0xf7c6c2c7 in ?? () from /usr/lib/libXm.so.4
(gdb) where
#0 0xf7c6c2c7 in ?? () from /usr/lib/libXm.so.4
#1 0x092927c0 in ?? ()
#2 0xff9f08ec in ?? ()
#3 0xff9f08ec in ?? ()
#4 0x00000001 in ?? ()
#5 0x092927c0 in ?? ()
#6 0x00000000 in ?? ()

Is this a known phenomenon? These are the libraries the executable depends on (the third party things like spinbox, sciplot etc. are statically linked as .a's into the final build):

ldd says:
---------------
linux-gate.so.1 => (0xffffe000)
libGLw.so.1 => /usr/lib/libGLw.so.1 (0xf7f16000)
libGLU.so.1 => /usr/lib/libGLU.so.1 (0xf7e9c000)
libGL.so.1 => /usr/lib/libGL.so.1 (0xf7df8000)
libXm.so.4 => /usr/lib/libXm.so.4 (0xf7b97000)
libXmu.so.6 => /usr/lib/libXmu.so.6 (0xf7b80000)
libXt.so.6 => /usr/lib/libXt.so.6 (0xf7b2e000)
libXext.so.6 => /usr/lib/libXext.so.6 (0xf7b1f000)
libX11.so.6 => /usr/lib/libX11.so.6 (0xf7a04000)
libm.so.6 => /lib/libm.so.6 (0xf79df000)
libc.so.6 => /lib/libc.so.6 (0xf78ac000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xf77be000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf77b1000)
libGLcore.so.1 => /usr/lib/libGLcore.so.1 (0xf6c9c000)
libnvidia-tls.so.1 => /usr/lib/tls/libnvidia-tls.so.1 (0xf6c9a000)
libdl.so.2 => /lib/libdl.so.2 (0xf6c96000)
libXp.so.6 => /usr/lib/libXp.so.6 (0xf6c8d000)
libXft.so.2 => /usr/lib/libXft.so.2 (0xf6c79000)
libXrender.so.1 => /usr/lib/libXrender.so.1 (0xf6c70000)
libfontconfig.so.1 => /usr/lib/libfontconfig.so.1 (0xf6c44000)
libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0xf6bd5000)
libz.so.1 => /lib/libz.so.1 (0xf6bc2000)
libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0xf6ba2000)
libpng12.so.0 => /usr/lib/libpng12.so.0 (0xf6b7d000)
libSM.so.6 => /usr/lib/libSM.so.6 (0xf6b73000)
libICE.so.6 => /usr/lib/libICE.so.6 (0xf6b5a000)
libXau.so.6 => /usr/lib/libXau.so.6 (0xf6b56000)
libxcb-xlib.so.0 => /usr/lib/libxcb-xlib.so.0 (0xf6b53000)
libxcb.so.1 => /usr/lib/libxcb.so.1 (0xf6b3a000)
/lib/ld-linux.so.2 (0xf7f43000)
libexpat.so.1 => /lib/libexpat.so.1 (0xf6b18000)

Many thanks in advance,

Greetings,
Martin


Mark

Mark's picture

Re: Segfaults with motif and motif tools

Hi,

Unfortunately, you have lots of variables to deal with moving from SGI to OM. One or more of those is probably the root of the problem...

1. SGI is Motif 1.2.x based, OM is Motif 2.3. Lots of changes there. And for fun, the SGI version of Motif was heavily modified to provide theming.
2. It appears that you are trying to run your application compiled in 32 bit mode on a 64 bit system. I am guessing the crash you are seeing when you run your Xbae examples in 64 bit mode is because Xbae is not 64 bit clean (assumption on my part, perhaps others will correct me with facts).
3. Switching compilers...

Writing widgets in Motif is much harder than other toolkits. And some widget writers cheat by peeking into internal Motif data structures. The offsets if these internal data structures changed from Motif 1.x to 2.x. So if you try to link Motif widgets built with Motif 1.2 with Motif 2.x, things start crashing. (If you have lesstif installed, you might have widgets that were compiled against which is effectively a Motif 1.2 library in terms of internal data structures).

My sense is that there is a big difference between running the widget examples and running and building your application. Although this is not the advise you probably wanted, if it was me, I would try to break things up and try a smaller subset of your application first.

Mark


ma_meister

ma_meister's picture

Re: Segfaults with motif and motif tools

Thanks for your reply! I did not know I was up to such fierce challenges :)

Do you think I could just get my hands on some Motif 1.2 for my Opensuse and the produce a statically linked executable? This then could be frozen and used until we develop the whole thing anew from scratch which I think is the best option (expecially since advanced open source visualization toolkits, like VTK, are around, making development times shorter).

Is it possible to get the same /(at least Motif) libraries as I have on the SGI as sources for PC/Linux?

Cheers and thank you all,
Martin


ma_meister

ma_meister's picture

Re: Segfaults with motif and motif tools

Hi again,

just found out that lesstif is not compatible with Motif 2.1, the lesstif - Version that is compatible with 1.2 is <0.92 which is not available to my knowledge.
See this:
http://www.lesstif.org/INSTALL.html#DefaultVersions

Any further ideas?
Thank you,
Martin


Mark

Mark's picture

Re: Segfaults with motif and motif tools

I was going to suggest that you might try using lesstif for all your Motif needs to avoid the porting/conversion issues. I know you can mix lesstif and OpenMotif.

You can't just use lesstif?

Mark


ma_meister

ma_meister's picture

Re: Segfaults with motif and motif tools

Hi Mark,

ich have just tried lesstif and did a simple trial with the Xmt lib (available here). My problem is, that with lesstif I cannot even execute all the example Files (Resource Files in the examples directory of Xmt, e.g. 20 and 22) that worked fine with the OpenMotif-2.1 installation I had on.

I then tried to cheat Xmt and created a fake XmVersion for it to compile like it would use a Motif 1.2 and then link and execute it with lesstif but still no change, same errors.

Example:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) where
#0  0x0000000000000000 in ?? ()
#1  0x0000000000424de2 in FocusMoved ()
#2  0x00002ae17a3ffbf1 in XtCallCallbackList () from /usr/lib64/libXt.so.6
#3  0x00002ae17a1165bb in _XmCallFocusMoved () from /home/meistmar/TestBed/lesstif/lib/libXm.so.2
#4  0x00002ae17a117110 in _XmMgrTraversal () from /home/meistmar/TestBed/lesstif/lib/libXm.so.2
#5  0x00002ae17a119c12 in _XmManagerFocusInInternal ()
   from /home/meistmar/TestBed/lesstif/lib/libXm.so.2
#6  0x00002ae17a119d78 in _XmManagerFocusIn () from /home/meistmar/TestBed/lesstif/lib/libXm.so.2
#7  0x00002ae17a434846 in ?? () from /usr/lib64/libXt.so.6
#8  0x00002ae17a434c5b in ?? () from /usr/lib64/libXt.so.6
#9  0x00002ae17a435340 in _XtTranslateEvent () from /usr/lib64/libXt.so.6
#10 0x00002ae17a40d836 in XtDispatchEventToWidget () from /usr/lib64/libXt.so.6
#11 0x00002ae17a40db76 in _XtSendFocusEvent () from /usr/lib64/libXt.so.6
#12 0x00002ae17a40d5a5 in XtDispatchEventToWidget () from /usr/lib64/libXt.so.6
#13 0x00002ae17a40dfc5 in ?? () from /usr/lib64/libXt.so.6
#14 0x00002ae17a40cfca in XtDispatchEvent () from /usr/lib64/libXt.so.6
#15 0x00002ae17a40d165 in XtAppMainLoop () from /usr/lib64/libXt.so.6
#16 0x00000000004089b9 in main (argc=2, argv=0x7fff30d3c958) at mockup.c:279

I am rapidly running out of ideas now. Is there something obvious that I am missing?

---------------
In my tests with my OpenMotif-2.1 installation I have entered multiple lines of "fprintf(stderr,..." into Xmt as I know that my crashes originate there just to trace the errors and when they happen. For example: before the second fprintf get printed I get the seg fault (Layout.c from Xmt):

             fprintf(stderr, "  Layout.c XmFontListCopying ...\n");
            lw->layout.render_table =  XmFontListCopy(lw->layout.font_list);
            fprintf(stderr, "  Layout.c XmFontListCopying done\n");

The segfault then is in XmFontListCopy

#0  0x00002b80623679ab in _XtSortPerDisplayList () from /usr/lib64/libXt.so.6
#1  0x00002b8062367a81 in _XtGetPerDisplay () from /usr/lib64/libXt.so.6
#2  0x00002b8062367b79 in XtDisplayToApplicationContext () from /usr/lib64/libXt.so.6
#3  0x00002b8062025368 in XmRenderTableCopy () from /usr/lib64/libXm.so.4
#4  0x000000000053651f in Initialize ()
#5  0x00002b80623656e9 in ?? () from /usr/lib64/libXt.so.6
#6  0x00002b8062366181 in ?? () from /usr/lib64/libXt.so.6
#7  0x00002b8062366ad6 in _XtCreateWidget () from /usr/lib64/libXt.so.6
#8  0x00002b8062366ebe in XtCreateWidget () from /usr/lib64/libXt.so.6
#9  0x00000000005356bc in XmtCreateLayout ()
#10 0x000000000054266d in XmtCreateWidgetType ()
#11 0x000000000052f1b8 in CreateChild ()
#12 0x000000000052f808 in CreateChildren ()
#13 0x000000000052fadf in _XmtCreateChildren ()
#14 0x000000000052fed6 in _XmtBuildDialog ()
#15 0x000000000052fff1 in XmtBuildQueryDialog ()
#16 0x00000000004f8312 in PipeNetworkModelLoaderCreate (w=0xa464b0, sv=0xa0fcc0, mode=1, model=0x0)
    at pipeNetworkModelWidget.c:2516
#17 0x00002b806235dbf1 in XtCallCallbackList () from /usr/lib64/libXt.so.6
#18 0x00002b8061f6be9d in ?? () from /usr/lib64/libXm.so.4
#19 0x00002b8062392846 in ?? () from /usr/lib64/libXt.so.6
#20 0x00002b8062392c5b in ?? () from /usr/lib64/libXt.so.6
#21 0x00002b8062393340 in _XtTranslateEvent () from /usr/lib64/libXt.so.6
#22 0x00002b806236b836 in XtDispatchEventToWidget () from /usr/lib64/libXt.so.6
#23 0x00002b806236c040 in ?? () from /usr/lib64/libXt.so.6
#24 0x00002b806236afca in XtDispatchEvent () from /usr/lib64/libXt.so.6
#25 0x00002b806236b165 in XtAppMainLoop () from /usr/lib64/libXt.so.6
#26 0x0000000000461f76 in main (argc=0, argv=) 

I am not clear how to go ahead and find this bug. Help!

I guess somewhere you have to draw a line and say "This is not my problem anymore", currently I draw this line just behind libXmt (and so assume that -lXmu -lXext -lXt -lX11 do their job just fine). So I am hoping to find some bug in Xmt but with each day it seems less likely.

Cheers,
Martin


fredk

fredk's picture

Re: Segfaults with motif and motif tools

Try running the program with valkyrie or valgrind to see where memory corruption is happening.
- Fred


ma_meister

ma_meister's picture

Re: Segfaults with motif and motif tools

Hello again,

I have compiled valkyrie and doing tests right now. There is one message from valgrind (memcheck) at startup

"Invalid write of size 8"
Address xxx is 6,140 bytes inside a block of size 6144 alloc'd

and if I am correct in reading the stack correctly (meaning the upper most part of this error is the problem) then this happens in "SetImagePixels32" that is an XMT (Motif Tools) Helper called from XmtCreatePixmapFromXmtImage

When I open a first window of the binary the second message is:

"Invalid read of size 8"
Address xxxxx is 0 bytes after a block of size 8 alloc'd

This seems to happen in _XmEntryTextGet (OpenMotif) (called from _XmStringDrawSegment, called from XmStringDrawUnderline) originating from an XtDispatchEvent.

-----------
Description of my segfault:
Now I will produce the segfault which works this way: in short:

I open a "File Dialog" and klick "Cancel" and open the dialog again.

Here, two things happen quasi randomly: either it segfaults on klicking the option in the menu bar (that should bring up the dialog a second time ) or the dialog displays and the font is changed/incorrect. It then lets me choose "Cancel" again but segfaults at the next item that I select from the menu bar.
I try to convince myself that the second way of segfaulting happens, when there is more time between mouse press and depress at the first click on "Cancel" (could be Voodoo, of course).

First try: Fast click on the first "Cancel"
-----------------------------------------------------------
Klick on the Menu item that produces the dialog:
no new entries in valkyrie (!) so seemingly no problem.
Klick on Cancel:

"Invalid read of size 8"
Address xxx is 8 bytes inside a block of 360 free'd
in XtWidgetToApplicationContext (called from XtDestroyWidget)

"Invalid read of size 8"
Address xxx is 152 bytes inside a block of 360 free'd
in XtWidgetToApplicationContext (called from XtDestroyWidget)

"Invalid read of size 1"
Address xxx is 28 bytes inside a block of 360 free'd
in XtWidgetToApplicationContext (called from XtDestroyWidget)

Then I click on this menu item again the dialog comes up a second time. But I get more than fifty "Invalid reads of size 8" in XmRenderTableCopy (called from XmtCreateLayout, called from XmtCreateWidgetType) and the next click on Cancel results in a "Invalid free()" in XmRenderTableFree. Strangely, the font is unchanged and I can open/reopen the dialog as often as I want, I guess, the valgrind free/new/malloc routines prevent bad things from happening.

I will try to look into the issue with the Pixmap/Image because that seems to be the first bug, altough I cannot quite understand what the problem is with the eror message: Why is it bad to have a smaller block inside another that is bigger? Or what is the problem in having an address that is 0 bytes after a block that was alloc'ed?

Need to read more valgrind documentation on that.

Thank you for any comments or enlightening remarks on this,

cheers,
Martin


fredk

fredk's picture

Re: Segfaults with motif and motif tools

It sounds like you are referencing a widget after it has been destroyed.


ma_meister

ma_meister's picture

Re: Segfaults with motif and motif tools

Hi there,

i finally solved the mystery. It was a XmFontListFree() in the "Destruktor"-Callback of a third-party widget (DoubleSS) that corrupted the font lists/memory of the following xmt-calls (in XmtCreateChildren etc). Not sure why this happens, commenting out this call renders the application working fine, although I suspect there could be a memory leak now (and looking into memcheck there are loads of leaks).

Thanks for your ideas and help,

Regards,
Martin