Prepend header record (or a string / a file) to large file in Scala / Java
up vote
0
down vote
favorite
What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.
E.g.
if prepend string is:
header_information|123.45|xyzn
and large file is:
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...
I would expect to get:
header_information|123.45|xyz
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...
scala io prepend
add a comment |
up vote
0
down vote
favorite
What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.
E.g.
if prepend string is:
header_information|123.45|xyzn
and large file is:
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...
I would expect to get:
header_information|123.45|xyz
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...
scala io prepend
1
Why not plain unix?
– erip
Nov 20 at 2:45
@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.
E.g.
if prepend string is:
header_information|123.45|xyzn
and large file is:
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...
I would expect to get:
header_information|123.45|xyz
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...
scala io prepend
What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.
E.g.
if prepend string is:
header_information|123.45|xyzn
and large file is:
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...
I would expect to get:
header_information|123.45|xyz
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...
scala io prepend
scala io prepend
asked Nov 20 at 1:09
Andrey Dmitriev
1571317
1571317
1
Why not plain unix?
– erip
Nov 20 at 2:45
@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12
add a comment |
1
Why not plain unix?
– erip
Nov 20 at 2:45
@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12
1
1
Why not plain unix?
– erip
Nov 20 at 2:45
Why not plain unix?
– erip
Nov 20 at 2:45
@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12
@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
I come up with the following solution:
- Turn prepend string/file into InputStream
- Turn large file into InputStream
- "Combine" InputStreams together using java.io.SequenceInputStream
Use java.nio.file.Files.copy to write to target file
object FileAppender {
def main(args: Array[String]): Unit = {
val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)
val largeFile = new FileInputStream("big_file.dat")
Files.copy(
new SequenceInputStream(stringToPrepend, largeFile),
Paths.get("output_file.dat"),
StandardCopyOption.REPLACE_EXISTING
)
}
}
Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).
This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
I come up with the following solution:
- Turn prepend string/file into InputStream
- Turn large file into InputStream
- "Combine" InputStreams together using java.io.SequenceInputStream
Use java.nio.file.Files.copy to write to target file
object FileAppender {
def main(args: Array[String]): Unit = {
val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)
val largeFile = new FileInputStream("big_file.dat")
Files.copy(
new SequenceInputStream(stringToPrepend, largeFile),
Paths.get("output_file.dat"),
StandardCopyOption.REPLACE_EXISTING
)
}
}
Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).
This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.
add a comment |
up vote
0
down vote
accepted
I come up with the following solution:
- Turn prepend string/file into InputStream
- Turn large file into InputStream
- "Combine" InputStreams together using java.io.SequenceInputStream
Use java.nio.file.Files.copy to write to target file
object FileAppender {
def main(args: Array[String]): Unit = {
val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)
val largeFile = new FileInputStream("big_file.dat")
Files.copy(
new SequenceInputStream(stringToPrepend, largeFile),
Paths.get("output_file.dat"),
StandardCopyOption.REPLACE_EXISTING
)
}
}
Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).
This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
I come up with the following solution:
- Turn prepend string/file into InputStream
- Turn large file into InputStream
- "Combine" InputStreams together using java.io.SequenceInputStream
Use java.nio.file.Files.copy to write to target file
object FileAppender {
def main(args: Array[String]): Unit = {
val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)
val largeFile = new FileInputStream("big_file.dat")
Files.copy(
new SequenceInputStream(stringToPrepend, largeFile),
Paths.get("output_file.dat"),
StandardCopyOption.REPLACE_EXISTING
)
}
}
Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).
This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.
I come up with the following solution:
- Turn prepend string/file into InputStream
- Turn large file into InputStream
- "Combine" InputStreams together using java.io.SequenceInputStream
Use java.nio.file.Files.copy to write to target file
object FileAppender {
def main(args: Array[String]): Unit = {
val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)
val largeFile = new FileInputStream("big_file.dat")
Files.copy(
new SequenceInputStream(stringToPrepend, largeFile),
Paths.get("output_file.dat"),
StandardCopyOption.REPLACE_EXISTING
)
}
}
Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).
This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.
answered Nov 20 at 1:43
Andrey Dmitriev
1571317
1571317
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384829%2fprepend-header-record-or-a-string-a-file-to-large-file-in-scala-java%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Why not plain unix?
– erip
Nov 20 at 2:45
@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.
– Andrey Dmitriev
Nov 20 at 9:12