Prepend header record (or a string / a file) to large file in Scala / Java

up vote
0
down vote

favorite

What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.

E.g.

if prepend string is:
header_information|123.45|xyzn

and large file is:

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

...

I would expect to get:

header_information|123.45|xyz

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

...

asked Nov 20 at 1:09

Andrey Dmitriev

1571317

1

Why not plain unix?
– erip
Nov 20 at 2:45

@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12

add a comment |

up vote
0
down vote

favorite

What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.

E.g.

if prepend string is:
header_information|123.45|xyzn

and large file is:

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

...

I would expect to get:

header_information|123.45|xyz

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

...

asked Nov 20 at 1:09

Andrey Dmitriev

1571317

1

Why not plain unix?
– erip
Nov 20 at 2:45

@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12

add a comment |

up vote
0
down vote

favorite

What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.

E.g.

if prepend string is:
header_information|123.45|xyzn

and large file is:

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

...

I would expect to get:

header_information|123.45|xyz

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

...

asked Nov 20 at 1:09

Andrey Dmitriev

1571317

What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.

E.g.

if prepend string is:
header_information|123.45|xyzn

and large file is:

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

...

I would expect to get:

header_information|123.45|xyz

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

abcdefghijklmnopqrstuvwxyz0123456789

...

scala io prepend

asked Nov 20 at 1:09

Andrey Dmitriev

1571317

asked Nov 20 at 1:09

Andrey Dmitriev

1571317

asked Nov 20 at 1:09

Andrey Dmitriev

1571317

asked Nov 20 at 1:09

Andrey Dmitriev

1571317

asked Nov 20 at 1:09

Andrey Dmitriev

1571317

1

Why not plain unix?
– erip
Nov 20 at 2:45

@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12

add a comment |

1

Why not plain unix?
– erip
Nov 20 at 2:45

@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12

Why not plain unix?
– erip
Nov 20 at 2:45

@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

I come up with the following solution:

Turn prepend string/file into InputStream

Turn large file into InputStream

"Combine" InputStreams together using java.io.SequenceInputStream

Use java.nio.file.Files.copy to write to target file

object FileAppender {

  def main(args: Array[String]): Unit = {

    val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)

    val largeFile = new FileInputStream("big_file.dat")

    Files.copy(

      new SequenceInputStream(stringToPrepend, largeFile),

      Paths.get("output_file.dat"),

      StandardCopyOption.REPLACE_EXISTING

    )

  }

}

Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).

This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.

answered Nov 20 at 1:43

Andrey Dmitriev

1571317

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384829%2fprepend-header-record-or-a-string-a-file-to-large-file-in-scala-java%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

I come up with the following solution:

Turn prepend string/file into InputStream

Turn large file into InputStream

"Combine" InputStreams together using java.io.SequenceInputStream

Use java.nio.file.Files.copy to write to target file

object FileAppender {

  def main(args: Array[String]): Unit = {

    val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)

    val largeFile = new FileInputStream("big_file.dat")

    Files.copy(

      new SequenceInputStream(stringToPrepend, largeFile),

      Paths.get("output_file.dat"),

      StandardCopyOption.REPLACE_EXISTING

    )

  }

}

Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).

This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.

answered Nov 20 at 1:43

Andrey Dmitriev

1571317

add a comment |

up vote
0
down vote

accepted

I come up with the following solution:

Turn prepend string/file into InputStream

Turn large file into InputStream

"Combine" InputStreams together using java.io.SequenceInputStream

Use java.nio.file.Files.copy to write to target file

object FileAppender {

  def main(args: Array[String]): Unit = {

    val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)

    val largeFile = new FileInputStream("big_file.dat")

    Files.copy(

      new SequenceInputStream(stringToPrepend, largeFile),

      Paths.get("output_file.dat"),

      StandardCopyOption.REPLACE_EXISTING

    )

  }

}

Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).

This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.

answered Nov 20 at 1:43

Andrey Dmitriev

1571317

add a comment |

up vote
0
down vote

accepted

I come up with the following solution:

Turn prepend string/file into InputStream

Turn large file into InputStream

"Combine" InputStreams together using java.io.SequenceInputStream

Use java.nio.file.Files.copy to write to target file

object FileAppender {

  def main(args: Array[String]): Unit = {

    val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)

    val largeFile = new FileInputStream("big_file.dat")

    Files.copy(

      new SequenceInputStream(stringToPrepend, largeFile),

      Paths.get("output_file.dat"),

      StandardCopyOption.REPLACE_EXISTING

    )

  }

}

Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).

This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.

answered Nov 20 at 1:43

Andrey Dmitriev

1571317

I come up with the following solution:

Turn prepend string/file into InputStream

Turn large file into InputStream

"Combine" InputStreams together using java.io.SequenceInputStream

Use java.nio.file.Files.copy to write to target file

object FileAppender {

  def main(args: Array[String]): Unit = {

    val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)

    val largeFile = new FileInputStream("big_file.dat")

    Files.copy(

      new SequenceInputStream(stringToPrepend, largeFile),

      Paths.get("output_file.dat"),

      StandardCopyOption.REPLACE_EXISTING

    )

  }

}

Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).

This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.

answered Nov 20 at 1:43

Andrey Dmitriev

1571317

answered Nov 20 at 1:43

Andrey Dmitriev

1571317

answered Nov 20 at 1:43

Andrey Dmitriev

1571317

answered Nov 20 at 1:43

Andrey Dmitriev

1571317

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Od,2jz6PG,eh,BBB,yVmRY,xI0Hqh5b

搜尋此網誌

Ytukyg