History of YouTube channel's subscriber count
I'm trying to get data of a channel's subscriber count over time so I can fit some cool graphs to it. The program is pretty crappy and just rips the HTML from https://socialblade.com/youtube/user/pewdiepie/realtime and finds where the live sub count is. For some reason the HTML I get changes only like once an hour so I don't get as frequent data as I'd like (Has something to do with cache?). I don't know much of how networking stuff works in Java, I was just trying to put something together and really just wanted an easy way to get the data so that I can apply some machine learning or LoggerPro curve fitting to that. I couldn't find an easy fix for the problem searching on Google as I'm not really sure what the problem even is. Oh, also, can it be considered as a DOS attack or something if I automatically connect to their site every 10 seconds or so?
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.URL;
public class Main {
public static void main(String args) throws Exception {
//String data = "";
PrintWriter out = new PrintWriter("pewDieSubs"+ System.currentTimeMillis()+".txt");
long lastTime = System.currentTimeMillis();
long deltaTime = 0;
System.setProperty("http.agent", "Chrome");
URL url = new URL("https://socialblade.com/youtube/user/pewdiepie/realtime");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
String lastInputLine = "";
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains("<p id="rawCount" style="display: none;">")) {
if (!inputLine.equals(lastInputLine)) {
lastInputLine = inputLine;
deltaTime = System.currentTimeMillis() - lastTime;
lastTime = System.currentTimeMillis();
System.out.println(inputLine);
String tmp = "";
for (int i = 0; i < 8; i++) {
tmp = tmp + inputLine.charAt(40 + i);
}
System.out.println(tmp + " --- deltaTime = " + deltaTime);
//data = data + "n" + lastTime + " " + tmp;
out.println(lastTime + " " + tmp);
out.flush();
}
in.close();
in = new BufferedReader(new InputStreamReader(url.openStream()));
Thread.sleep(10000);
}
}
in.close();
out.close();
}
}
java web-scraping
add a comment |
I'm trying to get data of a channel's subscriber count over time so I can fit some cool graphs to it. The program is pretty crappy and just rips the HTML from https://socialblade.com/youtube/user/pewdiepie/realtime and finds where the live sub count is. For some reason the HTML I get changes only like once an hour so I don't get as frequent data as I'd like (Has something to do with cache?). I don't know much of how networking stuff works in Java, I was just trying to put something together and really just wanted an easy way to get the data so that I can apply some machine learning or LoggerPro curve fitting to that. I couldn't find an easy fix for the problem searching on Google as I'm not really sure what the problem even is. Oh, also, can it be considered as a DOS attack or something if I automatically connect to their site every 10 seconds or so?
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.URL;
public class Main {
public static void main(String args) throws Exception {
//String data = "";
PrintWriter out = new PrintWriter("pewDieSubs"+ System.currentTimeMillis()+".txt");
long lastTime = System.currentTimeMillis();
long deltaTime = 0;
System.setProperty("http.agent", "Chrome");
URL url = new URL("https://socialblade.com/youtube/user/pewdiepie/realtime");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
String lastInputLine = "";
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains("<p id="rawCount" style="display: none;">")) {
if (!inputLine.equals(lastInputLine)) {
lastInputLine = inputLine;
deltaTime = System.currentTimeMillis() - lastTime;
lastTime = System.currentTimeMillis();
System.out.println(inputLine);
String tmp = "";
for (int i = 0; i < 8; i++) {
tmp = tmp + inputLine.charAt(40 + i);
}
System.out.println(tmp + " --- deltaTime = " + deltaTime);
//data = data + "n" + lastTime + " " + tmp;
out.println(lastTime + " " + tmp);
out.flush();
}
in.close();
in = new BufferedReader(new InputStreamReader(url.openStream()));
Thread.sleep(10000);
}
}
in.close();
out.close();
}
}
java web-scraping
There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct tosleep
for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.
– Aris_Kortex
Nov 24 '18 at 14:07
@Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.
– Auruttch
Nov 24 '18 at 14:53
Got it working using the youtube API. Thank you @Aris_Kortex
– Auruttch
Nov 25 '18 at 13:26
I can very well turns this into a proper answer so that you can upvote if you like.
– Aris_Kortex
Nov 28 '18 at 17:02
@Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.
– Auruttch
Dec 6 '18 at 20:22
add a comment |
I'm trying to get data of a channel's subscriber count over time so I can fit some cool graphs to it. The program is pretty crappy and just rips the HTML from https://socialblade.com/youtube/user/pewdiepie/realtime and finds where the live sub count is. For some reason the HTML I get changes only like once an hour so I don't get as frequent data as I'd like (Has something to do with cache?). I don't know much of how networking stuff works in Java, I was just trying to put something together and really just wanted an easy way to get the data so that I can apply some machine learning or LoggerPro curve fitting to that. I couldn't find an easy fix for the problem searching on Google as I'm not really sure what the problem even is. Oh, also, can it be considered as a DOS attack or something if I automatically connect to their site every 10 seconds or so?
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.URL;
public class Main {
public static void main(String args) throws Exception {
//String data = "";
PrintWriter out = new PrintWriter("pewDieSubs"+ System.currentTimeMillis()+".txt");
long lastTime = System.currentTimeMillis();
long deltaTime = 0;
System.setProperty("http.agent", "Chrome");
URL url = new URL("https://socialblade.com/youtube/user/pewdiepie/realtime");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
String lastInputLine = "";
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains("<p id="rawCount" style="display: none;">")) {
if (!inputLine.equals(lastInputLine)) {
lastInputLine = inputLine;
deltaTime = System.currentTimeMillis() - lastTime;
lastTime = System.currentTimeMillis();
System.out.println(inputLine);
String tmp = "";
for (int i = 0; i < 8; i++) {
tmp = tmp + inputLine.charAt(40 + i);
}
System.out.println(tmp + " --- deltaTime = " + deltaTime);
//data = data + "n" + lastTime + " " + tmp;
out.println(lastTime + " " + tmp);
out.flush();
}
in.close();
in = new BufferedReader(new InputStreamReader(url.openStream()));
Thread.sleep(10000);
}
}
in.close();
out.close();
}
}
java web-scraping
I'm trying to get data of a channel's subscriber count over time so I can fit some cool graphs to it. The program is pretty crappy and just rips the HTML from https://socialblade.com/youtube/user/pewdiepie/realtime and finds where the live sub count is. For some reason the HTML I get changes only like once an hour so I don't get as frequent data as I'd like (Has something to do with cache?). I don't know much of how networking stuff works in Java, I was just trying to put something together and really just wanted an easy way to get the data so that I can apply some machine learning or LoggerPro curve fitting to that. I couldn't find an easy fix for the problem searching on Google as I'm not really sure what the problem even is. Oh, also, can it be considered as a DOS attack or something if I automatically connect to their site every 10 seconds or so?
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.URL;
public class Main {
public static void main(String args) throws Exception {
//String data = "";
PrintWriter out = new PrintWriter("pewDieSubs"+ System.currentTimeMillis()+".txt");
long lastTime = System.currentTimeMillis();
long deltaTime = 0;
System.setProperty("http.agent", "Chrome");
URL url = new URL("https://socialblade.com/youtube/user/pewdiepie/realtime");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
String lastInputLine = "";
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains("<p id="rawCount" style="display: none;">")) {
if (!inputLine.equals(lastInputLine)) {
lastInputLine = inputLine;
deltaTime = System.currentTimeMillis() - lastTime;
lastTime = System.currentTimeMillis();
System.out.println(inputLine);
String tmp = "";
for (int i = 0; i < 8; i++) {
tmp = tmp + inputLine.charAt(40 + i);
}
System.out.println(tmp + " --- deltaTime = " + deltaTime);
//data = data + "n" + lastTime + " " + tmp;
out.println(lastTime + " " + tmp);
out.flush();
}
in.close();
in = new BufferedReader(new InputStreamReader(url.openStream()));
Thread.sleep(10000);
}
}
in.close();
out.close();
}
}
java web-scraping
java web-scraping
edited Nov 24 '18 at 13:33
Robin Green
22.5k876156
22.5k876156
asked Nov 24 '18 at 13:30
AuruttchAuruttch
113
113
There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct tosleep
for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.
– Aris_Kortex
Nov 24 '18 at 14:07
@Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.
– Auruttch
Nov 24 '18 at 14:53
Got it working using the youtube API. Thank you @Aris_Kortex
– Auruttch
Nov 25 '18 at 13:26
I can very well turns this into a proper answer so that you can upvote if you like.
– Aris_Kortex
Nov 28 '18 at 17:02
@Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.
– Auruttch
Dec 6 '18 at 20:22
add a comment |
There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct tosleep
for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.
– Aris_Kortex
Nov 24 '18 at 14:07
@Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.
– Auruttch
Nov 24 '18 at 14:53
Got it working using the youtube API. Thank you @Aris_Kortex
– Auruttch
Nov 25 '18 at 13:26
I can very well turns this into a proper answer so that you can upvote if you like.
– Aris_Kortex
Nov 28 '18 at 17:02
@Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.
– Auruttch
Dec 6 '18 at 20:22
There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct to
sleep
for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.– Aris_Kortex
Nov 24 '18 at 14:07
There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct to
sleep
for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.– Aris_Kortex
Nov 24 '18 at 14:07
@Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.
– Auruttch
Nov 24 '18 at 14:53
@Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.
– Auruttch
Nov 24 '18 at 14:53
Got it working using the youtube API. Thank you @Aris_Kortex
– Auruttch
Nov 25 '18 at 13:26
Got it working using the youtube API. Thank you @Aris_Kortex
– Auruttch
Nov 25 '18 at 13:26
I can very well turns this into a proper answer so that you can upvote if you like.
– Aris_Kortex
Nov 28 '18 at 17:02
I can very well turns this into a proper answer so that you can upvote if you like.
– Aris_Kortex
Nov 28 '18 at 17:02
@Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.
– Auruttch
Dec 6 '18 at 20:22
@Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.
– Auruttch
Dec 6 '18 at 20:22
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458645%2fhistory-of-youtube-channels-subscriber-count%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458645%2fhistory-of-youtube-channels-subscriber-count%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct to
sleep
for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.– Aris_Kortex
Nov 24 '18 at 14:07
@Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.
– Auruttch
Nov 24 '18 at 14:53
Got it working using the youtube API. Thank you @Aris_Kortex
– Auruttch
Nov 25 '18 at 13:26
I can very well turns this into a proper answer so that you can upvote if you like.
– Aris_Kortex
Nov 28 '18 at 17:02
@Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.
– Auruttch
Dec 6 '18 at 20:22