Re: TCL force

From: Peter Freddolino (petefred_at_ks.uiuc.edu)
Date: Thu Sep 25 2008 - 08:15:30 CDT

Hi Bin,
I believe all of those flagged blocks of memory are from the startup
routines and won't be leaking during the run (I'd bet that they actually
do get properly cleaned up at some point, but I'm not sure). If you get
no memory leak before adding in new force routines and memory leak
afterwards, the new code is the likely culprit. Could you try writing a
simple driver for your force evaluation routine to run it a few times
and valgrind that?
Peter

BIN ZHANG wrote:
> Hi, peter:
>
> according to your suggestion, I was trying to use swig to put a
> wrapper for the c++ code force calculation. But now I got a new weird
> problem, memory leaking. Then I used "valgrind" to check, and all the
> errors point to NAMD functions.
> Now I'm confused, these should be bugs of my code rather NAMD's!!!
> Have you encountered this before?
> Thanks a lot.
>
> Bin
>
> some outputs of valgrind:
> ===============================
>
> ==8765== 100,352 bytes in 1 blocks are possibly lost in loss record
> 483 of 539
> ==8765== at 0x490584B: operator new[](unsigned long)
> (vg_replace_malloc.c:274)
> ==8765== by 0x6DA6D0: SimParameters::print_config(ParseOptions&,
> ConfigList*, char*&) (in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x6DF793:
> SimParameters::initialize_config_data(ConfigList*, char*&) (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x679F39: NamdState::configListInit(ConfigList*) (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)==8765== by 0x6B96A1:
> ScriptTcl::initcheck() (in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x6BBA57: ScriptTcl::run(char*) (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)==8765== by 0x4B9177:
> main (in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765==
> ==8765==
> ==8765== 880,008 bytes in 1 blocks are possibly lost in loss record
> 511 of 539
> ==8765== at 0x490584B: operator new[](unsigned long)
> (vg_replace_malloc.c:274)
> ==8765== by 0x765E92: LBCommTable::NewTable(int) (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x765E65: LBCommTable::LBCommTable() (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x76505F: LBDB::ClearLoads() (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x75AE2A: LDClearLoads (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x658CF6: LdbCoordinator::initialize(PatchMap*,
> ComputeMap*, int) (in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x6805BE: Node::startup() (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x72E5F7: CkDeliverMessageFree (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2) ==8765== by 0x72EB22:
> _invokeEntryNoTrace(int, envelope*, void*) (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x72F48A: _invokeEntry(int, envelope*, void*) (in
> /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x72FD68: _deliverForBocMsg(CkCoreState*, int,
> envelope*, IrrGroup*) (in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== by 0x72FD3E: _processForBocMsg(CkCoreState*, envelope*)
> (in /home/bingo/BinNAMD/Linux-amd64-MPI/namd2)
> ==8765== ==8765== LEAK SUMMARY:
> ==8765== definitely lost: 3,272 bytes in 30 blocks.==8765==
> indirectly lost: 4,160 bytes in 1 blocks.
> ==8765== possibly lost: 1,267,520 bytes in 382 blocks.
> ==8765== still reachable: 78,079,877 bytes in 129,160 blocks.
> ==8765== suppressed: 0 bytes in 0 blocks.
> ==8765== Reachable blocks (those to which a pointer was found) are not
> shown.
> ==8765== To see them, rerun with: --leak-check=full --show-reachable=yes
>
> =============================
>
>
>
>
>
>
> On Sep 23, 2008, at 9:27 PM, Peter Freddolino wrote:
>
>> This is puzzling...
>> On what do you base the assessment that the problem is occurring in
>> the fortran code?
>> Does your fortran code use a lot of memory? Are the compute nodes
>> identical to the one that you're testing on?
>> Peter
>>
>> BIN ZHANG wrote:
>>> It seems that the segfault comes from my fortran code, but still
>>> it's very confusing.
>>> In Tcl, it fails at the code " set force [exec "./fortran_force
>>> $variable"] ". But if I run " ./fortran_force $variable " in
>>> shell, with $variable set to be the value where NAMD fails,
>>> everything is fine(no error at all).
>>>
>>> Bin
>>>
>>>
>>>
>>> On Sep 23, 2008, at 7:05 AM, Peter Freddolino wrote:
>>>
>>>> Is the segfault coming from your fortran code or namd itself?
>>>>
>>>> If it's coming from namd, it's probably due to issues with floating
>>>> point math in tclforces when running in parallel; there have been
>>>> previous threads on the subject on namd-l, and while I know Jim is
>>>> currently working on it with the tcl developers, for now the best
>>>> solution is probably to move more of the floating point work into
>>>> your compiled external code, since you're already using something
>>>> external anyway.
>>>> Peter
>>>>
>>>> BIN ZHANG wrote:
>>>>> Hi, peter:
>>>>>
>>>>> Thanks a lot for your response.
>>>>> " set force [exec "./fortran_force $variable"] ", this is what I
>>>>> was doing. sorry for the typo.
>>>>> Now I think the problem is: while I was trying to run NAMD with
>>>>> my script in serial, everything is fine. But for parallel
>>>>> running, I always got segmentation violation. Do you have any idea
>>>>> about why?
>>>>> Thanks a lot.
>>>>>
>>>>> Bin
>>>>>
>>>>>
>>>>>
>>>>> On Sep 22, 2008, at 10:37 PM, Peter Freddolino wrote:
>>>>>
>>>>>> Hi Bin,
>>>>>> you certainly can, but you'd need to do something like
>>>>>> set force [exec "./fortran_force $variable"]
>>>>>> ie, properly call an external command from tcl -- you'll want to
>>>>>> pay careful attention to how exec works. If you're willing to use
>>>>>> c/c++ instead of fortran, you can also use swig to put a tcl
>>>>>> wrapper around your external force calculation functions, which
>>>>>> is something I do frequently (one can always make this work in
>>>>>> principle by writing a tcl wrapper for your force application
>>>>>> function, but swig makes the process much easier).
>>>>>> With tclforces you don't need to distribute the executable
>>>>>> because tclforces only runs on the head node. With tclbc this
>>>>>> could be a problem if your compute nodes don't have access to the
>>>>>> same filesystem as node0; you'd need to work around this
>>>>>> accordingly.
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> BIN ZHANG wrote:
>>>>>>> Hi, All:
>>>>>>>
>>>>>>> I was trying to use the TCL force utility of NAMD to apply
>>>>>>> some forces on the system. Since the calculation of the force is
>>>>>>> a little complicated, I don't want to use TCL to calculate it.
>>>>>>> So my question is : can I write a FORTRAN code and compile it as
>>>>>>> an executable, then in the calforces procedure, using :
>>>>>>> set force [./FORTRAN_force $variable] to get the force?(assuming
>>>>>>> the FORTRAN_force is an executable compiled from source code).
>>>>>>>
>>>>>>> If I want to run NAMD in parallel, do I need to copy the
>>>>>>> executable to each node explicitly?
>>>>>>> Thanks a lot in advance.
>>>>>>>
>>>>>>> Bin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------------
>>>>> The tree of liberty must be refreshed from time to time with the
>>>>> blood of patriots and tyrants.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -------------------------------------------------------------
>>> The tree of liberty must be refreshed from time to time with the
>>> blood of patriots and tyrants.
>
>
>
>
>
>
>
>
>
> -------------------------------------------------------------
> The tree of liberty must be refreshed from time to time with the blood
> of patriots and tyrants.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:49:53 CST