Is it correct that Deedle/Series is slow compared to a list?












2















I am working on a data "intensive" app and I am not sure if I should use Series./DataFrame. It seems very interesting but it looks also way slower than the equivalent done with a List ... but I may not use the Series properly when I filter.
Please let me know what you think.



Thanks



type TSPoint<'a> =
{
Date : System.DateTime
Value : 'a
}

type TimeSerie<'a> = TSPoint<'a> list

let sd = System.DateTime(1950, 2, 1)
let tsd =[1..100000] |> List.map (fun x -> sd.AddDays(float x))

// creating a List of TSPoint
let tsList = tsd |> List.map (fun x -> {Date = x ; Value = 1.})
// creating the same as a serie
let tsSeries = Series(tsd , [1..100000] |> List.map (fun _ -> 1.))

// function to "randomise" the list of dates
let shuffleG xs = xs |> List.sortBy (fun _ -> Guid.NewGuid())

// new date list to search within out tsList and tsSeries
let d = tsd |> shuffleG |> List.take 1000

// Filter
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))


Here is what I get:



List -> Real: 00:00:04.780, CPU: 00:00:04.508, GC gen0: 917, gen1: 2, gen2: 1



Series -> Real: 00:00:54.386, CPU: 00:00:49.311, GC gen0: 944, gen1: 7, gen2: 3










share|improve this question























  • tbh, List itself will be very slow. Depending on your use case consider an array.

    – s952163
    Nov 22 '18 at 8:07






  • 1





    but an array is mutable, which is something I am not a huge fan of

    – Jeff_hk
    Nov 22 '18 at 8:11






  • 2





    Just don't mutate it :D

    – s952163
    Nov 22 '18 at 8:16
















2















I am working on a data "intensive" app and I am not sure if I should use Series./DataFrame. It seems very interesting but it looks also way slower than the equivalent done with a List ... but I may not use the Series properly when I filter.
Please let me know what you think.



Thanks



type TSPoint<'a> =
{
Date : System.DateTime
Value : 'a
}

type TimeSerie<'a> = TSPoint<'a> list

let sd = System.DateTime(1950, 2, 1)
let tsd =[1..100000] |> List.map (fun x -> sd.AddDays(float x))

// creating a List of TSPoint
let tsList = tsd |> List.map (fun x -> {Date = x ; Value = 1.})
// creating the same as a serie
let tsSeries = Series(tsd , [1..100000] |> List.map (fun _ -> 1.))

// function to "randomise" the list of dates
let shuffleG xs = xs |> List.sortBy (fun _ -> Guid.NewGuid())

// new date list to search within out tsList and tsSeries
let d = tsd |> shuffleG |> List.take 1000

// Filter
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))


Here is what I get:



List -> Real: 00:00:04.780, CPU: 00:00:04.508, GC gen0: 917, gen1: 2, gen2: 1



Series -> Real: 00:00:54.386, CPU: 00:00:49.311, GC gen0: 944, gen1: 7, gen2: 3










share|improve this question























  • tbh, List itself will be very slow. Depending on your use case consider an array.

    – s952163
    Nov 22 '18 at 8:07






  • 1





    but an array is mutable, which is something I am not a huge fan of

    – Jeff_hk
    Nov 22 '18 at 8:11






  • 2





    Just don't mutate it :D

    – s952163
    Nov 22 '18 at 8:16














2












2








2








I am working on a data "intensive" app and I am not sure if I should use Series./DataFrame. It seems very interesting but it looks also way slower than the equivalent done with a List ... but I may not use the Series properly when I filter.
Please let me know what you think.



Thanks



type TSPoint<'a> =
{
Date : System.DateTime
Value : 'a
}

type TimeSerie<'a> = TSPoint<'a> list

let sd = System.DateTime(1950, 2, 1)
let tsd =[1..100000] |> List.map (fun x -> sd.AddDays(float x))

// creating a List of TSPoint
let tsList = tsd |> List.map (fun x -> {Date = x ; Value = 1.})
// creating the same as a serie
let tsSeries = Series(tsd , [1..100000] |> List.map (fun _ -> 1.))

// function to "randomise" the list of dates
let shuffleG xs = xs |> List.sortBy (fun _ -> Guid.NewGuid())

// new date list to search within out tsList and tsSeries
let d = tsd |> shuffleG |> List.take 1000

// Filter
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))


Here is what I get:



List -> Real: 00:00:04.780, CPU: 00:00:04.508, GC gen0: 917, gen1: 2, gen2: 1



Series -> Real: 00:00:54.386, CPU: 00:00:49.311, GC gen0: 944, gen1: 7, gen2: 3










share|improve this question














I am working on a data "intensive" app and I am not sure if I should use Series./DataFrame. It seems very interesting but it looks also way slower than the equivalent done with a List ... but I may not use the Series properly when I filter.
Please let me know what you think.



Thanks



type TSPoint<'a> =
{
Date : System.DateTime
Value : 'a
}

type TimeSerie<'a> = TSPoint<'a> list

let sd = System.DateTime(1950, 2, 1)
let tsd =[1..100000] |> List.map (fun x -> sd.AddDays(float x))

// creating a List of TSPoint
let tsList = tsd |> List.map (fun x -> {Date = x ; Value = 1.})
// creating the same as a serie
let tsSeries = Series(tsd , [1..100000] |> List.map (fun _ -> 1.))

// function to "randomise" the list of dates
let shuffleG xs = xs |> List.sortBy (fun _ -> Guid.NewGuid())

// new date list to search within out tsList and tsSeries
let d = tsd |> shuffleG |> List.take 1000

// Filter
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))


Here is what I get:



List -> Real: 00:00:04.780, CPU: 00:00:04.508, GC gen0: 917, gen1: 2, gen2: 1



Series -> Real: 00:00:54.386, CPU: 00:00:49.311, GC gen0: 944, gen1: 7, gen2: 3







f# deedle






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 22 '18 at 8:03









Jeff_hkJeff_hk

1328




1328













  • tbh, List itself will be very slow. Depending on your use case consider an array.

    – s952163
    Nov 22 '18 at 8:07






  • 1





    but an array is mutable, which is something I am not a huge fan of

    – Jeff_hk
    Nov 22 '18 at 8:11






  • 2





    Just don't mutate it :D

    – s952163
    Nov 22 '18 at 8:16



















  • tbh, List itself will be very slow. Depending on your use case consider an array.

    – s952163
    Nov 22 '18 at 8:07






  • 1





    but an array is mutable, which is something I am not a huge fan of

    – Jeff_hk
    Nov 22 '18 at 8:11






  • 2





    Just don't mutate it :D

    – s952163
    Nov 22 '18 at 8:16

















tbh, List itself will be very slow. Depending on your use case consider an array.

– s952163
Nov 22 '18 at 8:07





tbh, List itself will be very slow. Depending on your use case consider an array.

– s952163
Nov 22 '18 at 8:07




1




1





but an array is mutable, which is something I am not a huge fan of

– Jeff_hk
Nov 22 '18 at 8:11





but an array is mutable, which is something I am not a huge fan of

– Jeff_hk
Nov 22 '18 at 8:11




2




2





Just don't mutate it :D

– s952163
Nov 22 '18 at 8:16





Just don't mutate it :D

– s952163
Nov 22 '18 at 8:16












1 Answer
1






active

oldest

votes


















1














In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.



If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.



In your particular case, you are running Series.filter on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.



However, what your code really does is that you are using Series.filter to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).



If you rewrite the code as follows, you'll get much better performance with Deedle than with list:



d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds

d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds

d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds





share|improve this answer
























  • thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).

    – Jeff_hk
    Nov 23 '18 at 2:07













  • @John_hk Yes, TryGet is the way to go if the key or value might be missing!

    – Tomas Petricek
    Nov 23 '18 at 22:07











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426313%2fis-it-correct-that-deedle-series-is-slow-compared-to-a-list%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.



If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.



In your particular case, you are running Series.filter on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.



However, what your code really does is that you are using Series.filter to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).



If you rewrite the code as follows, you'll get much better performance with Deedle than with list:



d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds

d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds

d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds





share|improve this answer
























  • thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).

    – Jeff_hk
    Nov 23 '18 at 2:07













  • @John_hk Yes, TryGet is the way to go if the key or value might be missing!

    – Tomas Petricek
    Nov 23 '18 at 22:07
















1














In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.



If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.



In your particular case, you are running Series.filter on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.



However, what your code really does is that you are using Series.filter to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).



If you rewrite the code as follows, you'll get much better performance with Deedle than with list:



d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds

d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds

d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds





share|improve this answer
























  • thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).

    – Jeff_hk
    Nov 23 '18 at 2:07













  • @John_hk Yes, TryGet is the way to go if the key or value might be missing!

    – Tomas Petricek
    Nov 23 '18 at 22:07














1












1








1







In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.



If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.



In your particular case, you are running Series.filter on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.



However, what your code really does is that you are using Series.filter to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).



If you rewrite the code as follows, you'll get much better performance with Deedle than with list:



d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds

d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds

d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds





share|improve this answer













In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.



If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.



In your particular case, you are running Series.filter on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.



However, what your code really does is that you are using Series.filter to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).



If you rewrite the code as follows, you'll get much better performance with Deedle than with list:



d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds

d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds

d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 22 '18 at 11:43









Tomas PetricekTomas Petricek

199k13290463




199k13290463













  • thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).

    – Jeff_hk
    Nov 23 '18 at 2:07













  • @John_hk Yes, TryGet is the way to go if the key or value might be missing!

    – Tomas Petricek
    Nov 23 '18 at 22:07



















  • thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).

    – Jeff_hk
    Nov 23 '18 at 2:07













  • @John_hk Yes, TryGet is the way to go if the key or value might be missing!

    – Tomas Petricek
    Nov 23 '18 at 22:07

















thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).

– Jeff_hk
Nov 23 '18 at 2:07







thanks, I didn't think to proceed this way. I will use TryGet so I get returned a OptionalValue and so I can do some pattern matching instead of getting an exception (in case of vfailure to find the value).

– Jeff_hk
Nov 23 '18 at 2:07















@John_hk Yes, TryGet is the way to go if the key or value might be missing!

– Tomas Petricek
Nov 23 '18 at 22:07





@John_hk Yes, TryGet is the way to go if the key or value might be missing!

– Tomas Petricek
Nov 23 '18 at 22:07


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426313%2fis-it-correct-that-deedle-series-is-slow-compared-to-a-list%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wiesbaden

Marschland

Dieringhausen