“Cannot allocate memory” when starting new Flink job












1















We are running Flink on a 3 VM cluster. Each VM has about 40 Go of RAM. Each day we stop some jobs and start new ones. After some days, starting a new job is rejected with a "Cannot allocate memory" error :




OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000340000000, 12884901888, 0) failed; error='Cannot allocate memory' (errno=12)




Investigations show that the task manager RAM is ever growing, to the point it exceeds the allowed 40 Go, although the jobs are canceled.



I don't have access (yet) to the cluster so I tried some tests on a standalone cluster on my laptop and monitored the task manager RAM:




  • With jvisualvm I can see everything working as intended. I load the job memory, then clean it and wait (a few minutes) for the GB to fire up. The heap is released.

  • Whereas with top, memory is - and stay - high.


enter image description here



At the moment we are restarting the cluster every morning to account for this memory issue, but we can't afford it anymore as we'll need jobs running 24/7.



I'm pretty sure it's not a Flink issue but can someone point me in the right direction about what we're doing wrong here?










share|improve this question























  • you have to gain access! :-)

    – xerx593
    Dec 4 '18 at 11:13
















1















We are running Flink on a 3 VM cluster. Each VM has about 40 Go of RAM. Each day we stop some jobs and start new ones. After some days, starting a new job is rejected with a "Cannot allocate memory" error :




OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000340000000, 12884901888, 0) failed; error='Cannot allocate memory' (errno=12)




Investigations show that the task manager RAM is ever growing, to the point it exceeds the allowed 40 Go, although the jobs are canceled.



I don't have access (yet) to the cluster so I tried some tests on a standalone cluster on my laptop and monitored the task manager RAM:




  • With jvisualvm I can see everything working as intended. I load the job memory, then clean it and wait (a few minutes) for the GB to fire up. The heap is released.

  • Whereas with top, memory is - and stay - high.


enter image description here



At the moment we are restarting the cluster every morning to account for this memory issue, but we can't afford it anymore as we'll need jobs running 24/7.



I'm pretty sure it's not a Flink issue but can someone point me in the right direction about what we're doing wrong here?










share|improve this question























  • you have to gain access! :-)

    – xerx593
    Dec 4 '18 at 11:13














1












1








1


0






We are running Flink on a 3 VM cluster. Each VM has about 40 Go of RAM. Each day we stop some jobs and start new ones. After some days, starting a new job is rejected with a "Cannot allocate memory" error :




OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000340000000, 12884901888, 0) failed; error='Cannot allocate memory' (errno=12)




Investigations show that the task manager RAM is ever growing, to the point it exceeds the allowed 40 Go, although the jobs are canceled.



I don't have access (yet) to the cluster so I tried some tests on a standalone cluster on my laptop and monitored the task manager RAM:




  • With jvisualvm I can see everything working as intended. I load the job memory, then clean it and wait (a few minutes) for the GB to fire up. The heap is released.

  • Whereas with top, memory is - and stay - high.


enter image description here



At the moment we are restarting the cluster every morning to account for this memory issue, but we can't afford it anymore as we'll need jobs running 24/7.



I'm pretty sure it's not a Flink issue but can someone point me in the right direction about what we're doing wrong here?










share|improve this question














We are running Flink on a 3 VM cluster. Each VM has about 40 Go of RAM. Each day we stop some jobs and start new ones. After some days, starting a new job is rejected with a "Cannot allocate memory" error :




OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000340000000, 12884901888, 0) failed; error='Cannot allocate memory' (errno=12)




Investigations show that the task manager RAM is ever growing, to the point it exceeds the allowed 40 Go, although the jobs are canceled.



I don't have access (yet) to the cluster so I tried some tests on a standalone cluster on my laptop and monitored the task manager RAM:




  • With jvisualvm I can see everything working as intended. I load the job memory, then clean it and wait (a few minutes) for the GB to fire up. The heap is released.

  • Whereas with top, memory is - and stay - high.


enter image description here



At the moment we are restarting the cluster every morning to account for this memory issue, but we can't afford it anymore as we'll need jobs running 24/7.



I'm pretty sure it's not a Flink issue but can someone point me in the right direction about what we're doing wrong here?







out-of-memory apache-flink






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 25 '18 at 20:05









Olivier CoillandOlivier Coilland

2,6731019




2,6731019













  • you have to gain access! :-)

    – xerx593
    Dec 4 '18 at 11:13



















  • you have to gain access! :-)

    – xerx593
    Dec 4 '18 at 11:13

















you have to gain access! :-)

– xerx593
Dec 4 '18 at 11:13





you have to gain access! :-)

– xerx593
Dec 4 '18 at 11:13












1 Answer
1






active

oldest

votes


















1














On standalone mode, Flink may not release resources as you wished.
For example, resources holden by static member in an instance.



It is highly recommended using YARN or K8s as runtime env.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53471416%2fcannot-allocate-memory-when-starting-new-flink-job%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    On standalone mode, Flink may not release resources as you wished.
    For example, resources holden by static member in an instance.



    It is highly recommended using YARN or K8s as runtime env.






    share|improve this answer




























      1














      On standalone mode, Flink may not release resources as you wished.
      For example, resources holden by static member in an instance.



      It is highly recommended using YARN or K8s as runtime env.






      share|improve this answer


























        1












        1








        1







        On standalone mode, Flink may not release resources as you wished.
        For example, resources holden by static member in an instance.



        It is highly recommended using YARN or K8s as runtime env.






        share|improve this answer













        On standalone mode, Flink may not release resources as you wished.
        For example, resources holden by static member in an instance.



        It is highly recommended using YARN or K8s as runtime env.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 28 '18 at 7:25









        Ruoyu DaiRuoyu Dai

        112




        112
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53471416%2fcannot-allocate-memory-when-starting-new-flink-job%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Tonle Sap (See)

            I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

            Guatemaltekische Davis-Cup-Mannschaft