Re: TCL force

From: BIN ZHANG (zhngbn_at_gmail.com)
Date: Wed Sep 24 2008 - 14:23:38 CDT

Hi, peter:

     according to your suggestion, I was trying to use swig to put a
wrapper for the c++ code force calculation. But now I got a new weird
problem, memory leaking. Then I used "valgrind" to check, and all the
errors point to NAMD functions.
     Now I'm confused, these should be bugs of my code rather NAMD's!!!
     Have you encountered this before?
     Thanks a lot.

Bin

some outputs of valgrind:
===============================

==8765== 100,352 bytes in 1 blocks are possibly lost in loss record
483 of 539
==8765== at 0x490584B: operator new[](unsigned long)
(vg_replace_malloc.c:274)
==8765== by 0x6DA6D0: SimParameters::print_config(ParseOptions&,
ConfigList*, char*&) (in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
==8765== by 0x6DF793:
SimParameters::initialize_config_data(ConfigList*, char*&) (in /home/
bingo/BinNAMD/Linux-amd64-MPI/namd2)
==8765== by 0x679F39: NamdState::configListInit(ConfigList*) (in /
home/bingo/BinNAMD/Linux-amd64-MPI/namd2)==8765== by 0x6B96A1:
ScriptTcl::initcheck() (in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
==8765== by 0x6BBA57: ScriptTcl::run(char*) (in /home/bingo/BinNAMD/
Linux-amd64-MPI/namd2)==8765== by 0x4B9177: main (in /home/bingo/
BinNAMD/Linux-amd64-MPI/namd2)
==8765==
==8765==
==8765== 880,008 bytes in 1 blocks are possibly lost in loss record
511 of 539
==8765== at 0x490584B: operator new[](unsigned long)
(vg_replace_malloc.c:274)
==8765== by 0x765E92: LBCommTable::NewTable(int) (in /home/bingo/
BinNAMD/Linux-amd64-MPI/namd2)
==8765== by 0x765E65: LBCommTable::LBCommTable() (in /home/bingo/
BinNAMD/Linux-amd64-MPI/namd2)
==8765== by 0x76505F: LBDB::ClearLoads() (in /home/bingo/BinNAMD/
Linux-amd64-MPI/namd2)
==8765== by 0x75AE2A: LDClearLoads (in /home/bingo/BinNAMD/Linux-
amd64-MPI/namd2)
==8765== by 0x658CF6: LdbCoordinator::initialize(PatchMap*,
ComputeMap*, int) (in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
==8765== by 0x6805BE: Node::startup() (in /home/bingo/BinNAMD/Linux-
amd64-MPI/namd2)
==8765== by 0x72E5F7: CkDeliverMessageFree (in /home/bingo/BinNAMD/
Linux-amd64-MPI/namd2) ==8765== by 0x72EB22:
_invokeEntryNoTrace(int, envelope*, void*) (in /home/bingo/BinNAMD/
Linux-amd64-MPI/namd2)
==8765== by 0x72F48A: _invokeEntry(int, envelope*, void*) (in /home/
bingo/BinNAMD/Linux-amd64-MPI/namd2)
==8765== by 0x72FD68: _deliverForBocMsg(CkCoreState*, int,
envelope*, IrrGroup*) (in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
==8765== by 0x72FD3E: _processForBocMsg(CkCoreState*, envelope*)
(in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
==8765== ==8765== LEAK SUMMARY:
==8765== definitely lost: 3,272 bytes in 30 blocks.==8765==
indirectly lost: 4,160 bytes in 1 blocks.
==8765== possibly lost: 1,267,520 bytes in 382 blocks.
==8765== still reachable: 78,079,877 bytes in 129,160 blocks.
==8765== suppressed: 0 bytes in 0 blocks.
==8765== Reachable blocks (those to which a pointer was found) are not
shown.
==8765== To see them, rerun with: --leak-check=full --show-reachable=yes

=============================

On Sep 23, 2008, at 9:27 PM, Peter Freddolino wrote:

> This is puzzling...
> On what do you base the assessment that the problem is occurring in
> the fortran code?
> Does your fortran code use a lot of memory? Are the compute nodes
> identical to the one that you're testing on?
> Peter
>
> BIN ZHANG wrote:
>> It seems that the segfault comes from my fortran code, but still
>> it's very confusing.
>> In Tcl, it fails at the code " set force [exec "./fortran_force
>> $variable"] ". But if I run " ./fortran_force $variable " in
>> shell, with $variable set to be the value where NAMD fails,
>> everything is fine(no error at all).
>>
>> Bin
>>
>>
>>
>> On Sep 23, 2008, at 7:05 AM, Peter Freddolino wrote:
>>
>>> Is the segfault coming from your fortran code or namd itself?
>>>
>>> If it's coming from namd, it's probably due to issues with
>>> floating point math in tclforces when running in parallel; there
>>> have been previous threads on the subject on namd-l, and while I
>>> know Jim is currently working on it with the tcl developers, for
>>> now the best solution is probably to move more of the floating
>>> point work into your compiled external code, since you're already
>>> using something external anyway.
>>> Peter
>>>
>>> BIN ZHANG wrote:
>>>> Hi, peter:
>>>>
>>>> Thanks a lot for your response.
>>>> " set force [exec "./fortran_force $variable"] ", this is what
>>>> I was doing. sorry for the typo.
>>>> Now I think the problem is: while I was trying to run NAMD with
>>>> my script in serial, everything is fine. But for parallel
>>>> running, I always got segmentation violation. Do you have any
>>>> idea about why?
>>>> Thanks a lot.
>>>>
>>>> Bin
>>>>
>>>>
>>>>
>>>> On Sep 22, 2008, at 10:37 PM, Peter Freddolino wrote:
>>>>
>>>>> Hi Bin,
>>>>> you certainly can, but you'd need to do something like
>>>>> set force [exec "./fortran_force $variable"]
>>>>> ie, properly call an external command from tcl -- you'll want to
>>>>> pay careful attention to how exec works. If you're willing to
>>>>> use c/c++ instead of fortran, you can also use swig to put a tcl
>>>>> wrapper around your external force calculation functions, which
>>>>> is something I do frequently (one can always make this work in
>>>>> principle by writing a tcl wrapper for your force application
>>>>> function, but swig makes the process much easier).
>>>>> With tclforces you don't need to distribute the executable
>>>>> because tclforces only runs on the head node. With tclbc this
>>>>> could be a problem if your compute nodes don't have access to
>>>>> the same filesystem as node0; you'd need to work around this
>>>>> accordingly.
>>>>>
>>>>> Peter
>>>>>
>>>>> BIN ZHANG wrote:
>>>>>> Hi, All:
>>>>>>
>>>>>> I was trying to use the TCL force utility of NAMD to apply
>>>>>> some forces on the system. Since the calculation of the force
>>>>>> is a little complicated, I don't want to use TCL to calculate
>>>>>> it. So my question is : can I write a FORTRAN code and compile
>>>>>> it as an executable, then in the calforces procedure, using :
>>>>>> set force [./FORTRAN_force $variable] to get the force?
>>>>>> (assuming the FORTRAN_force is an executable compiled from
>>>>>> source code).
>>>>>>
>>>>>> If I want to run NAMD in parallel, do I need to copy the
>>>>>> executable to each node explicitly?
>>>>>> Thanks a lot in advance.
>>>>>>
>>>>>> Bin
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------
>>>> The tree of liberty must be refreshed from time to time with the
>>>> blood of patriots and tyrants.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -------------------------------------------------------------
>> The tree of liberty must be refreshed from time to time with the
>> blood of patriots and tyrants.

-------------------------------------------------------------
The tree of liberty must be refreshed from time to time with the blood
of patriots and tyrants.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:49:52 CST