History of YouTube channel's subscriber count












0















I'm trying to get data of a channel's subscriber count over time so I can fit some cool graphs to it. The program is pretty crappy and just rips the HTML from https://socialblade.com/youtube/user/pewdiepie/realtime and finds where the live sub count is. For some reason the HTML I get changes only like once an hour so I don't get as frequent data as I'd like (Has something to do with cache?). I don't know much of how networking stuff works in Java, I was just trying to put something together and really just wanted an easy way to get the data so that I can apply some machine learning or LoggerPro curve fitting to that. I couldn't find an easy fix for the problem searching on Google as I'm not really sure what the problem even is. Oh, also, can it be considered as a DOS attack or something if I automatically connect to their site every 10 seconds or so?



import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.URL;

public class Main {
public static void main(String args) throws Exception {

//String data = "";
PrintWriter out = new PrintWriter("pewDieSubs"+ System.currentTimeMillis()+".txt");

long lastTime = System.currentTimeMillis();
long deltaTime = 0;

System.setProperty("http.agent", "Chrome");
URL url = new URL("https://socialblade.com/youtube/user/pewdiepie/realtime");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

String inputLine;
String lastInputLine = "";
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains("<p id="rawCount" style="display: none;">")) {

if (!inputLine.equals(lastInputLine)) {

lastInputLine = inputLine;
deltaTime = System.currentTimeMillis() - lastTime;
lastTime = System.currentTimeMillis();

System.out.println(inputLine);

String tmp = "";
for (int i = 0; i < 8; i++) {
tmp = tmp + inputLine.charAt(40 + i);
}
System.out.println(tmp + " --- deltaTime = " + deltaTime);

//data = data + "n" + lastTime + " " + tmp;

out.println(lastTime + " " + tmp);
out.flush();


}

in.close();
in = new BufferedReader(new InputStreamReader(url.openStream()));
Thread.sleep(10000);
}
}

in.close();
out.close();

}


}










share|improve this question

























  • There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct to sleep for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.

    – Aris_Kortex
    Nov 24 '18 at 14:07











  • @Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.

    – Auruttch
    Nov 24 '18 at 14:53











  • Got it working using the youtube API. Thank you @Aris_Kortex

    – Auruttch
    Nov 25 '18 at 13:26













  • I can very well turns this into a proper answer so that you can upvote if you like.

    – Aris_Kortex
    Nov 28 '18 at 17:02











  • @Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.

    – Auruttch
    Dec 6 '18 at 20:22
















0















I'm trying to get data of a channel's subscriber count over time so I can fit some cool graphs to it. The program is pretty crappy and just rips the HTML from https://socialblade.com/youtube/user/pewdiepie/realtime and finds where the live sub count is. For some reason the HTML I get changes only like once an hour so I don't get as frequent data as I'd like (Has something to do with cache?). I don't know much of how networking stuff works in Java, I was just trying to put something together and really just wanted an easy way to get the data so that I can apply some machine learning or LoggerPro curve fitting to that. I couldn't find an easy fix for the problem searching on Google as I'm not really sure what the problem even is. Oh, also, can it be considered as a DOS attack or something if I automatically connect to their site every 10 seconds or so?



import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.URL;

public class Main {
public static void main(String args) throws Exception {

//String data = "";
PrintWriter out = new PrintWriter("pewDieSubs"+ System.currentTimeMillis()+".txt");

long lastTime = System.currentTimeMillis();
long deltaTime = 0;

System.setProperty("http.agent", "Chrome");
URL url = new URL("https://socialblade.com/youtube/user/pewdiepie/realtime");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

String inputLine;
String lastInputLine = "";
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains("<p id="rawCount" style="display: none;">")) {

if (!inputLine.equals(lastInputLine)) {

lastInputLine = inputLine;
deltaTime = System.currentTimeMillis() - lastTime;
lastTime = System.currentTimeMillis();

System.out.println(inputLine);

String tmp = "";
for (int i = 0; i < 8; i++) {
tmp = tmp + inputLine.charAt(40 + i);
}
System.out.println(tmp + " --- deltaTime = " + deltaTime);

//data = data + "n" + lastTime + " " + tmp;

out.println(lastTime + " " + tmp);
out.flush();


}

in.close();
in = new BufferedReader(new InputStreamReader(url.openStream()));
Thread.sleep(10000);
}
}

in.close();
out.close();

}


}










share|improve this question

























  • There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct to sleep for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.

    – Aris_Kortex
    Nov 24 '18 at 14:07











  • @Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.

    – Auruttch
    Nov 24 '18 at 14:53











  • Got it working using the youtube API. Thank you @Aris_Kortex

    – Auruttch
    Nov 25 '18 at 13:26













  • I can very well turns this into a proper answer so that you can upvote if you like.

    – Aris_Kortex
    Nov 28 '18 at 17:02











  • @Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.

    – Auruttch
    Dec 6 '18 at 20:22














0












0








0








I'm trying to get data of a channel's subscriber count over time so I can fit some cool graphs to it. The program is pretty crappy and just rips the HTML from https://socialblade.com/youtube/user/pewdiepie/realtime and finds where the live sub count is. For some reason the HTML I get changes only like once an hour so I don't get as frequent data as I'd like (Has something to do with cache?). I don't know much of how networking stuff works in Java, I was just trying to put something together and really just wanted an easy way to get the data so that I can apply some machine learning or LoggerPro curve fitting to that. I couldn't find an easy fix for the problem searching on Google as I'm not really sure what the problem even is. Oh, also, can it be considered as a DOS attack or something if I automatically connect to their site every 10 seconds or so?



import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.URL;

public class Main {
public static void main(String args) throws Exception {

//String data = "";
PrintWriter out = new PrintWriter("pewDieSubs"+ System.currentTimeMillis()+".txt");

long lastTime = System.currentTimeMillis();
long deltaTime = 0;

System.setProperty("http.agent", "Chrome");
URL url = new URL("https://socialblade.com/youtube/user/pewdiepie/realtime");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

String inputLine;
String lastInputLine = "";
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains("<p id="rawCount" style="display: none;">")) {

if (!inputLine.equals(lastInputLine)) {

lastInputLine = inputLine;
deltaTime = System.currentTimeMillis() - lastTime;
lastTime = System.currentTimeMillis();

System.out.println(inputLine);

String tmp = "";
for (int i = 0; i < 8; i++) {
tmp = tmp + inputLine.charAt(40 + i);
}
System.out.println(tmp + " --- deltaTime = " + deltaTime);

//data = data + "n" + lastTime + " " + tmp;

out.println(lastTime + " " + tmp);
out.flush();


}

in.close();
in = new BufferedReader(new InputStreamReader(url.openStream()));
Thread.sleep(10000);
}
}

in.close();
out.close();

}


}










share|improve this question
















I'm trying to get data of a channel's subscriber count over time so I can fit some cool graphs to it. The program is pretty crappy and just rips the HTML from https://socialblade.com/youtube/user/pewdiepie/realtime and finds where the live sub count is. For some reason the HTML I get changes only like once an hour so I don't get as frequent data as I'd like (Has something to do with cache?). I don't know much of how networking stuff works in Java, I was just trying to put something together and really just wanted an easy way to get the data so that I can apply some machine learning or LoggerPro curve fitting to that. I couldn't find an easy fix for the problem searching on Google as I'm not really sure what the problem even is. Oh, also, can it be considered as a DOS attack or something if I automatically connect to their site every 10 seconds or so?



import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.URL;

public class Main {
public static void main(String args) throws Exception {

//String data = "";
PrintWriter out = new PrintWriter("pewDieSubs"+ System.currentTimeMillis()+".txt");

long lastTime = System.currentTimeMillis();
long deltaTime = 0;

System.setProperty("http.agent", "Chrome");
URL url = new URL("https://socialblade.com/youtube/user/pewdiepie/realtime");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

String inputLine;
String lastInputLine = "";
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains("<p id="rawCount" style="display: none;">")) {

if (!inputLine.equals(lastInputLine)) {

lastInputLine = inputLine;
deltaTime = System.currentTimeMillis() - lastTime;
lastTime = System.currentTimeMillis();

System.out.println(inputLine);

String tmp = "";
for (int i = 0; i < 8; i++) {
tmp = tmp + inputLine.charAt(40 + i);
}
System.out.println(tmp + " --- deltaTime = " + deltaTime);

//data = data + "n" + lastTime + " " + tmp;

out.println(lastTime + " " + tmp);
out.flush();


}

in.close();
in = new BufferedReader(new InputStreamReader(url.openStream()));
Thread.sleep(10000);
}
}

in.close();
out.close();

}


}







java web-scraping






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 24 '18 at 13:33









Robin Green

22.5k876156




22.5k876156










asked Nov 24 '18 at 13:30









AuruttchAuruttch

113




113













  • There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct to sleep for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.

    – Aris_Kortex
    Nov 24 '18 at 14:07











  • @Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.

    – Auruttch
    Nov 24 '18 at 14:53











  • Got it working using the youtube API. Thank you @Aris_Kortex

    – Auruttch
    Nov 25 '18 at 13:26













  • I can very well turns this into a proper answer so that you can upvote if you like.

    – Aris_Kortex
    Nov 28 '18 at 17:02











  • @Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.

    – Auruttch
    Dec 6 '18 at 20:22



















  • There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct to sleep for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.

    – Aris_Kortex
    Nov 24 '18 at 14:07











  • @Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.

    – Auruttch
    Nov 24 '18 at 14:53











  • Got it working using the youtube API. Thank you @Aris_Kortex

    – Auruttch
    Nov 25 '18 at 13:26













  • I can very well turns this into a proper answer so that you can upvote if you like.

    – Aris_Kortex
    Nov 28 '18 at 17:02











  • @Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.

    – Auruttch
    Dec 6 '18 at 20:22

















There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct to sleep for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.

– Aris_Kortex
Nov 24 '18 at 14:07





There are several problems with your approach. First of all the way you retrieve a channel's views is quite tiresome and prone to error. YouTube offers an API that can provide this information. This will save parsing and scraping raw HTML. If you need to check periodically you need to create a new thread that will run periodically. What you do at the moment is to run everything in the main thread which you then instruct to sleep for 10 seconds. Your process is not repeatable and will only run once. I suggest you take a look at both the API and Java threads.

– Aris_Kortex
Nov 24 '18 at 14:07













@Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.

– Auruttch
Nov 24 '18 at 14:53





@Aris_Kortex Believe it or not, but I have run the code and it does not only run once, it does repeatedly get the HTML and as I said, the problem is that the HTML it gets back only changes ~once an hour. But it does change and I do get new data, just not as frequently as I'd wish. You are right in that I should probably use the Youtube API. I just thought that if I could get this thing working with a minor fix, I could move onto what I actually wanted to do with the data. It'd also be interesting to get this to work so I could in the future use it for some application where there's no API.

– Auruttch
Nov 24 '18 at 14:53













Got it working using the youtube API. Thank you @Aris_Kortex

– Auruttch
Nov 25 '18 at 13:26







Got it working using the youtube API. Thank you @Aris_Kortex

– Auruttch
Nov 25 '18 at 13:26















I can very well turns this into a proper answer so that you can upvote if you like.

– Aris_Kortex
Nov 28 '18 at 17:02





I can very well turns this into a proper answer so that you can upvote if you like.

– Aris_Kortex
Nov 28 '18 at 17:02













@Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.

– Auruttch
Dec 6 '18 at 20:22





@Aris_Kortex Well, I actually used the API in kind of a cheap way and just scraped the html from googleapis.com/youtube/v3/…. Didn't change the code much, but the data I'm getting from this site updates every time I get it again. I really just wanted to move on to analyzing the data and this works well enough. I still don't know why social blade doesn't give me fresh data, so I don't know if I've solved the question I asked in the first place. You can still post a proper answer if you think it could be helpful to others.

– Auruttch
Dec 6 '18 at 20:22












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458645%2fhistory-of-youtube-channels-subscriber-count%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458645%2fhistory-of-youtube-channels-subscriber-count%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wiesbaden

Marschland

Dieringhausen