Copying data from S3 to Redshift hangs
I've been trying to load data into Redshift
for the last couple of days with no success. I have provided the correct IAM
role to the cluster, I have given access to S3
, I am using the COPY
command with either the AWS
credentials or the IAM
role and so far no success. What can be the reason for this? It has come to the point that I don't have many options left.
So the code is pretty basic, nothing fancy there. See below:
copy test_schema.test from 's3://company.test/tmp/append.csv.gz'
iam_role 'arn:aws:iam::<rolenumber>/RedshiftCopyUnload'
delimiter ',' gzip;
I didn't put any error messages because there are none. The code simply hangs and I have left it running for well over 40 minutes with no results. If I go into the Queries section in Redshift I dont see any abnormal. I am using Aginity and SQL Workbench to run the queries.
I also tried to manually insert queries in Redshift and seems that works. COPY and UNLOAD do not work and even though I have created Roles with access to S3 and associated with the cluster I still get this problem.
Thoughts?
EDIT: Solution has been found. Basically it was a connectivity problem within our VPC. A VPC endpoint had to be created and associated with the subnet used by Redshift.
amazon-web-services amazon-s3 amazon-redshift
add a comment |
I've been trying to load data into Redshift
for the last couple of days with no success. I have provided the correct IAM
role to the cluster, I have given access to S3
, I am using the COPY
command with either the AWS
credentials or the IAM
role and so far no success. What can be the reason for this? It has come to the point that I don't have many options left.
So the code is pretty basic, nothing fancy there. See below:
copy test_schema.test from 's3://company.test/tmp/append.csv.gz'
iam_role 'arn:aws:iam::<rolenumber>/RedshiftCopyUnload'
delimiter ',' gzip;
I didn't put any error messages because there are none. The code simply hangs and I have left it running for well over 40 minutes with no results. If I go into the Queries section in Redshift I dont see any abnormal. I am using Aginity and SQL Workbench to run the queries.
I also tried to manually insert queries in Redshift and seems that works. COPY and UNLOAD do not work and even though I have created Roles with access to S3 and associated with the cluster I still get this problem.
Thoughts?
EDIT: Solution has been found. Basically it was a connectivity problem within our VPC. A VPC endpoint had to be created and associated with the subnet used by Redshift.
amazon-web-services amazon-s3 amazon-redshift
2
To be able to assist you, we will need more information. Please edit your question to include a sample of the input data, theCOPY
command you are using, the error message you are receiving, plus any other information that would help us to try and diagnose/reproduce your situation.
– John Rotenstein
Nov 20 at 23:58
As well as the above, if you can log on to the aws console, check the resource usage of redshift (io and cpu) and also check the load job on there (is it in "running" state). If you have the ddl (including sort / dist ) of the target table and the know sort order of your input data. Lastly, the number of rows & size of your cluster.
– Jon Scott
Nov 21 at 7:12
add a comment |
I've been trying to load data into Redshift
for the last couple of days with no success. I have provided the correct IAM
role to the cluster, I have given access to S3
, I am using the COPY
command with either the AWS
credentials or the IAM
role and so far no success. What can be the reason for this? It has come to the point that I don't have many options left.
So the code is pretty basic, nothing fancy there. See below:
copy test_schema.test from 's3://company.test/tmp/append.csv.gz'
iam_role 'arn:aws:iam::<rolenumber>/RedshiftCopyUnload'
delimiter ',' gzip;
I didn't put any error messages because there are none. The code simply hangs and I have left it running for well over 40 minutes with no results. If I go into the Queries section in Redshift I dont see any abnormal. I am using Aginity and SQL Workbench to run the queries.
I also tried to manually insert queries in Redshift and seems that works. COPY and UNLOAD do not work and even though I have created Roles with access to S3 and associated with the cluster I still get this problem.
Thoughts?
EDIT: Solution has been found. Basically it was a connectivity problem within our VPC. A VPC endpoint had to be created and associated with the subnet used by Redshift.
amazon-web-services amazon-s3 amazon-redshift
I've been trying to load data into Redshift
for the last couple of days with no success. I have provided the correct IAM
role to the cluster, I have given access to S3
, I am using the COPY
command with either the AWS
credentials or the IAM
role and so far no success. What can be the reason for this? It has come to the point that I don't have many options left.
So the code is pretty basic, nothing fancy there. See below:
copy test_schema.test from 's3://company.test/tmp/append.csv.gz'
iam_role 'arn:aws:iam::<rolenumber>/RedshiftCopyUnload'
delimiter ',' gzip;
I didn't put any error messages because there are none. The code simply hangs and I have left it running for well over 40 minutes with no results. If I go into the Queries section in Redshift I dont see any abnormal. I am using Aginity and SQL Workbench to run the queries.
I also tried to manually insert queries in Redshift and seems that works. COPY and UNLOAD do not work and even though I have created Roles with access to S3 and associated with the cluster I still get this problem.
Thoughts?
EDIT: Solution has been found. Basically it was a connectivity problem within our VPC. A VPC endpoint had to be created and associated with the subnet used by Redshift.
amazon-web-services amazon-s3 amazon-redshift
amazon-web-services amazon-s3 amazon-redshift
edited Nov 22 at 14:13
asked Nov 20 at 23:19
Dimitris
90921437
90921437
2
To be able to assist you, we will need more information. Please edit your question to include a sample of the input data, theCOPY
command you are using, the error message you are receiving, plus any other information that would help us to try and diagnose/reproduce your situation.
– John Rotenstein
Nov 20 at 23:58
As well as the above, if you can log on to the aws console, check the resource usage of redshift (io and cpu) and also check the load job on there (is it in "running" state). If you have the ddl (including sort / dist ) of the target table and the know sort order of your input data. Lastly, the number of rows & size of your cluster.
– Jon Scott
Nov 21 at 7:12
add a comment |
2
To be able to assist you, we will need more information. Please edit your question to include a sample of the input data, theCOPY
command you are using, the error message you are receiving, plus any other information that would help us to try and diagnose/reproduce your situation.
– John Rotenstein
Nov 20 at 23:58
As well as the above, if you can log on to the aws console, check the resource usage of redshift (io and cpu) and also check the load job on there (is it in "running" state). If you have the ddl (including sort / dist ) of the target table and the know sort order of your input data. Lastly, the number of rows & size of your cluster.
– Jon Scott
Nov 21 at 7:12
2
2
To be able to assist you, we will need more information. Please edit your question to include a sample of the input data, the
COPY
command you are using, the error message you are receiving, plus any other information that would help us to try and diagnose/reproduce your situation.– John Rotenstein
Nov 20 at 23:58
To be able to assist you, we will need more information. Please edit your question to include a sample of the input data, the
COPY
command you are using, the error message you are receiving, plus any other information that would help us to try and diagnose/reproduce your situation.– John Rotenstein
Nov 20 at 23:58
As well as the above, if you can log on to the aws console, check the resource usage of redshift (io and cpu) and also check the load job on there (is it in "running" state). If you have the ddl (including sort / dist ) of the target table and the know sort order of your input data. Lastly, the number of rows & size of your cluster.
– Jon Scott
Nov 21 at 7:12
As well as the above, if you can log on to the aws console, check the resource usage of redshift (io and cpu) and also check the load job on there (is it in "running" state). If you have the ddl (including sort / dist ) of the target table and the know sort order of your input data. Lastly, the number of rows & size of your cluster.
– Jon Scott
Nov 21 at 7:12
add a comment |
1 Answer
1
active
oldest
votes
I agree with JohnRotenstein that, there needs more information to provide the answer. I would suggest you to take simple data points and simple table.
Here are step-by-step solution, I hope by doing that, you should be able to resolve your issue.
Assume here is your table structure.
Here I'm doing most of data types to prove my point.
create table sales(
salesid integer,
commission decimal(8,2),
saledate date,
description varchar(255),
created_at timestamp default sysdate,
updated_at timestamp);
Just to make it simple, here is your data file resides in S3.
Content in CSV(sales-example.txt)
salesid,commission,saledate,description,created_at,updated_at
1|3.55|2018-12-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
2|6.55|2018-01-01|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
4|7.55|2018-02-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
5|3.55||Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
7|3.50|2018-10-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
Run following two command using the psql terminal or any sql connector. Make sure to run second command as well.
copy sales(salesid,commission,saledate,description,created_at,updated_at) from 's3://example-bucket/foo/bar/sales-example.txt' credentials 'aws_access_key_id=************;aws_secret_access_key=***********' IGNOREHEADER 1;
commit;
I hope, this should help you in debugging your issue.
Thanks but that didnt work either. I run it and what I can see in Redshift Queries isCOPY ANALYZE sales
and my SQL client tools hanging.
– Dimitris
Nov 21 at 10:11
Does your select query runs fine, select * from tab1, if so, insert few record manually and then try unload command to same bucket? This should tell you if there is connection issue.
– Red Boy
Nov 21 at 11:30
So I tried everything and I think there is a connectivity issue between Redshift and S3. INSERT queries work perfectly fine. UNLOAD and COPY don't work. I run this through our VPC and I have setup a Role with Read only access to S3 but not sure what is the issue. The Role is associated with the cluster, all looks good but no COPY
– Dimitris
Nov 21 at 12:59
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403088%2fcopying-data-from-s3-to-redshift-hangs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I agree with JohnRotenstein that, there needs more information to provide the answer. I would suggest you to take simple data points and simple table.
Here are step-by-step solution, I hope by doing that, you should be able to resolve your issue.
Assume here is your table structure.
Here I'm doing most of data types to prove my point.
create table sales(
salesid integer,
commission decimal(8,2),
saledate date,
description varchar(255),
created_at timestamp default sysdate,
updated_at timestamp);
Just to make it simple, here is your data file resides in S3.
Content in CSV(sales-example.txt)
salesid,commission,saledate,description,created_at,updated_at
1|3.55|2018-12-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
2|6.55|2018-01-01|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
4|7.55|2018-02-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
5|3.55||Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
7|3.50|2018-10-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
Run following two command using the psql terminal or any sql connector. Make sure to run second command as well.
copy sales(salesid,commission,saledate,description,created_at,updated_at) from 's3://example-bucket/foo/bar/sales-example.txt' credentials 'aws_access_key_id=************;aws_secret_access_key=***********' IGNOREHEADER 1;
commit;
I hope, this should help you in debugging your issue.
Thanks but that didnt work either. I run it and what I can see in Redshift Queries isCOPY ANALYZE sales
and my SQL client tools hanging.
– Dimitris
Nov 21 at 10:11
Does your select query runs fine, select * from tab1, if so, insert few record manually and then try unload command to same bucket? This should tell you if there is connection issue.
– Red Boy
Nov 21 at 11:30
So I tried everything and I think there is a connectivity issue between Redshift and S3. INSERT queries work perfectly fine. UNLOAD and COPY don't work. I run this through our VPC and I have setup a Role with Read only access to S3 but not sure what is the issue. The Role is associated with the cluster, all looks good but no COPY
– Dimitris
Nov 21 at 12:59
add a comment |
I agree with JohnRotenstein that, there needs more information to provide the answer. I would suggest you to take simple data points and simple table.
Here are step-by-step solution, I hope by doing that, you should be able to resolve your issue.
Assume here is your table structure.
Here I'm doing most of data types to prove my point.
create table sales(
salesid integer,
commission decimal(8,2),
saledate date,
description varchar(255),
created_at timestamp default sysdate,
updated_at timestamp);
Just to make it simple, here is your data file resides in S3.
Content in CSV(sales-example.txt)
salesid,commission,saledate,description,created_at,updated_at
1|3.55|2018-12-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
2|6.55|2018-01-01|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
4|7.55|2018-02-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
5|3.55||Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
7|3.50|2018-10-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
Run following two command using the psql terminal or any sql connector. Make sure to run second command as well.
copy sales(salesid,commission,saledate,description,created_at,updated_at) from 's3://example-bucket/foo/bar/sales-example.txt' credentials 'aws_access_key_id=************;aws_secret_access_key=***********' IGNOREHEADER 1;
commit;
I hope, this should help you in debugging your issue.
Thanks but that didnt work either. I run it and what I can see in Redshift Queries isCOPY ANALYZE sales
and my SQL client tools hanging.
– Dimitris
Nov 21 at 10:11
Does your select query runs fine, select * from tab1, if so, insert few record manually and then try unload command to same bucket? This should tell you if there is connection issue.
– Red Boy
Nov 21 at 11:30
So I tried everything and I think there is a connectivity issue between Redshift and S3. INSERT queries work perfectly fine. UNLOAD and COPY don't work. I run this through our VPC and I have setup a Role with Read only access to S3 but not sure what is the issue. The Role is associated with the cluster, all looks good but no COPY
– Dimitris
Nov 21 at 12:59
add a comment |
I agree with JohnRotenstein that, there needs more information to provide the answer. I would suggest you to take simple data points and simple table.
Here are step-by-step solution, I hope by doing that, you should be able to resolve your issue.
Assume here is your table structure.
Here I'm doing most of data types to prove my point.
create table sales(
salesid integer,
commission decimal(8,2),
saledate date,
description varchar(255),
created_at timestamp default sysdate,
updated_at timestamp);
Just to make it simple, here is your data file resides in S3.
Content in CSV(sales-example.txt)
salesid,commission,saledate,description,created_at,updated_at
1|3.55|2018-12-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
2|6.55|2018-01-01|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
4|7.55|2018-02-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
5|3.55||Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
7|3.50|2018-10-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
Run following two command using the psql terminal or any sql connector. Make sure to run second command as well.
copy sales(salesid,commission,saledate,description,created_at,updated_at) from 's3://example-bucket/foo/bar/sales-example.txt' credentials 'aws_access_key_id=************;aws_secret_access_key=***********' IGNOREHEADER 1;
commit;
I hope, this should help you in debugging your issue.
I agree with JohnRotenstein that, there needs more information to provide the answer. I would suggest you to take simple data points and simple table.
Here are step-by-step solution, I hope by doing that, you should be able to resolve your issue.
Assume here is your table structure.
Here I'm doing most of data types to prove my point.
create table sales(
salesid integer,
commission decimal(8,2),
saledate date,
description varchar(255),
created_at timestamp default sysdate,
updated_at timestamp);
Just to make it simple, here is your data file resides in S3.
Content in CSV(sales-example.txt)
salesid,commission,saledate,description,created_at,updated_at
1|3.55|2018-12-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
2|6.55|2018-01-01|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
4|7.55|2018-02-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
5|3.55||Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
7|3.50|2018-10-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
Run following two command using the psql terminal or any sql connector. Make sure to run second command as well.
copy sales(salesid,commission,saledate,description,created_at,updated_at) from 's3://example-bucket/foo/bar/sales-example.txt' credentials 'aws_access_key_id=************;aws_secret_access_key=***********' IGNOREHEADER 1;
commit;
I hope, this should help you in debugging your issue.
answered Nov 21 at 2:02
Red Boy
2,0752923
2,0752923
Thanks but that didnt work either. I run it and what I can see in Redshift Queries isCOPY ANALYZE sales
and my SQL client tools hanging.
– Dimitris
Nov 21 at 10:11
Does your select query runs fine, select * from tab1, if so, insert few record manually and then try unload command to same bucket? This should tell you if there is connection issue.
– Red Boy
Nov 21 at 11:30
So I tried everything and I think there is a connectivity issue between Redshift and S3. INSERT queries work perfectly fine. UNLOAD and COPY don't work. I run this through our VPC and I have setup a Role with Read only access to S3 but not sure what is the issue. The Role is associated with the cluster, all looks good but no COPY
– Dimitris
Nov 21 at 12:59
add a comment |
Thanks but that didnt work either. I run it and what I can see in Redshift Queries isCOPY ANALYZE sales
and my SQL client tools hanging.
– Dimitris
Nov 21 at 10:11
Does your select query runs fine, select * from tab1, if so, insert few record manually and then try unload command to same bucket? This should tell you if there is connection issue.
– Red Boy
Nov 21 at 11:30
So I tried everything and I think there is a connectivity issue between Redshift and S3. INSERT queries work perfectly fine. UNLOAD and COPY don't work. I run this through our VPC and I have setup a Role with Read only access to S3 but not sure what is the issue. The Role is associated with the cluster, all looks good but no COPY
– Dimitris
Nov 21 at 12:59
Thanks but that didnt work either. I run it and what I can see in Redshift Queries is
COPY ANALYZE sales
and my SQL client tools hanging.– Dimitris
Nov 21 at 10:11
Thanks but that didnt work either. I run it and what I can see in Redshift Queries is
COPY ANALYZE sales
and my SQL client tools hanging.– Dimitris
Nov 21 at 10:11
Does your select query runs fine, select * from tab1, if so, insert few record manually and then try unload command to same bucket? This should tell you if there is connection issue.
– Red Boy
Nov 21 at 11:30
Does your select query runs fine, select * from tab1, if so, insert few record manually and then try unload command to same bucket? This should tell you if there is connection issue.
– Red Boy
Nov 21 at 11:30
So I tried everything and I think there is a connectivity issue between Redshift and S3. INSERT queries work perfectly fine. UNLOAD and COPY don't work. I run this through our VPC and I have setup a Role with Read only access to S3 but not sure what is the issue. The Role is associated with the cluster, all looks good but no COPY
– Dimitris
Nov 21 at 12:59
So I tried everything and I think there is a connectivity issue between Redshift and S3. INSERT queries work perfectly fine. UNLOAD and COPY don't work. I run this through our VPC and I have setup a Role with Read only access to S3 but not sure what is the issue. The Role is associated with the cluster, all looks good but no COPY
– Dimitris
Nov 21 at 12:59
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403088%2fcopying-data-from-s3-to-redshift-hangs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
To be able to assist you, we will need more information. Please edit your question to include a sample of the input data, the
COPY
command you are using, the error message you are receiving, plus any other information that would help us to try and diagnose/reproduce your situation.– John Rotenstein
Nov 20 at 23:58
As well as the above, if you can log on to the aws console, check the resource usage of redshift (io and cpu) and also check the load job on there (is it in "running" state). If you have the ddl (including sort / dist ) of the target table and the know sort order of your input data. Lastly, the number of rows & size of your cluster.
– Jon Scott
Nov 21 at 7:12