Splitting a string into words and punctuation












52















I'm trying to split a string up into words and punctuation, adding the punctuation to the list produced by the split.



For instance:



>>> c = "help, me"
>>> print c.split()
['help,', 'me']


What I really want the list to look like is:



['help', ',', 'me']


So, I want the string split at whitespace with the punctuation split from the words.



I've tried to parse the string first and then run the split:



>>> for character in c:
... if character in ".,;!?":
... outputCharacter = " %s" % character
... else:
... outputCharacter = character
... separatedPunctuation += outputCharacter
>>> print separatedPunctuation
help , me
>>> print separatedPunctuation.split()
['help', ',', 'me']


This produces the result I want, but is painfully slow on large files.



Is there a way to do this more efficiently?










share|improve this question




















  • 1





    For this example (not the general case) c.replace(' ','').partition(',')

    – Chris_Rands
    Nov 21 '16 at 8:59
















52















I'm trying to split a string up into words and punctuation, adding the punctuation to the list produced by the split.



For instance:



>>> c = "help, me"
>>> print c.split()
['help,', 'me']


What I really want the list to look like is:



['help', ',', 'me']


So, I want the string split at whitespace with the punctuation split from the words.



I've tried to parse the string first and then run the split:



>>> for character in c:
... if character in ".,;!?":
... outputCharacter = " %s" % character
... else:
... outputCharacter = character
... separatedPunctuation += outputCharacter
>>> print separatedPunctuation
help , me
>>> print separatedPunctuation.split()
['help', ',', 'me']


This produces the result I want, but is painfully slow on large files.



Is there a way to do this more efficiently?










share|improve this question




















  • 1





    For this example (not the general case) c.replace(' ','').partition(',')

    – Chris_Rands
    Nov 21 '16 at 8:59














52












52








52


14






I'm trying to split a string up into words and punctuation, adding the punctuation to the list produced by the split.



For instance:



>>> c = "help, me"
>>> print c.split()
['help,', 'me']


What I really want the list to look like is:



['help', ',', 'me']


So, I want the string split at whitespace with the punctuation split from the words.



I've tried to parse the string first and then run the split:



>>> for character in c:
... if character in ".,;!?":
... outputCharacter = " %s" % character
... else:
... outputCharacter = character
... separatedPunctuation += outputCharacter
>>> print separatedPunctuation
help , me
>>> print separatedPunctuation.split()
['help', ',', 'me']


This produces the result I want, but is painfully slow on large files.



Is there a way to do this more efficiently?










share|improve this question
















I'm trying to split a string up into words and punctuation, adding the punctuation to the list produced by the split.



For instance:



>>> c = "help, me"
>>> print c.split()
['help,', 'me']


What I really want the list to look like is:



['help', ',', 'me']


So, I want the string split at whitespace with the punctuation split from the words.



I've tried to parse the string first and then run the split:



>>> for character in c:
... if character in ".,;!?":
... outputCharacter = " %s" % character
... else:
... outputCharacter = character
... separatedPunctuation += outputCharacter
>>> print separatedPunctuation
help , me
>>> print separatedPunctuation.split()
['help', ',', 'me']


This produces the result I want, but is painfully slow on large files.



Is there a way to do this more efficiently?







python string split






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 14 '08 at 23:56









Fionnuala

84.5k791130




84.5k791130










asked Dec 14 '08 at 23:30









David ADavid A

3961311




3961311








  • 1





    For this example (not the general case) c.replace(' ','').partition(',')

    – Chris_Rands
    Nov 21 '16 at 8:59














  • 1





    For this example (not the general case) c.replace(' ','').partition(',')

    – Chris_Rands
    Nov 21 '16 at 8:59








1




1





For this example (not the general case) c.replace(' ','').partition(',')

– Chris_Rands
Nov 21 '16 at 8:59





For this example (not the general case) c.replace(' ','').partition(',')

– Chris_Rands
Nov 21 '16 at 8:59












10 Answers
10






active

oldest

votes


















73














This is more or less the way to do it:



>>> import re
>>> re.findall(r"[w']+|[.,!?;]", "Hello, I'm a string!")
['Hello', ',', "I'm", 'a', 'string', '!']


The trick is, not to think about where to split the string, but what to include in the tokens.



Caveats:




  • The underscore (_) is considered an inner-word character. Replace w, if you don't want that.

  • This will not work with (single) quotes in the string.

  • Put any additional punctuation marks you want to use in the right half of the regular expression.

  • Anything not explicitely mentioned in the re is silently dropped.






share|improve this answer





















  • 2





    If you want to split at ANY punctuation, including ', try re.findall(r"[w]+|[^sw]", "Hello, I'm a string!"). The result is ['Hello', ',', 'I', "'", 'm', 'a', 'string', '!'] Note also that digits are included in the word match.

    – Codie CodeMonkey
    May 15 '12 at 8:21













  • Sorry! could you explain how exactly this is working?

    – Curious
    Feb 5 '16 at 2:36











  • @Curious: to be honest, no I coiuld not. Because, where should I start? What do you know? Which part is a problem for you? What do you want to achieve?

    – user3850
    Feb 5 '16 at 19:01











  • Never mind! I understood this myself! Thanks for the reply :)

    – Curious
    Feb 5 '16 at 20:39



















30














Here is a Unicode-aware version:



re.findall(r"w+|[^ws]", text, re.UNICODE)


The first alternative catches sequences of word characters (as defined by unicode, so "résumé" won't turn into ['r', 'sum']); the second catches individual non-word characters, ignoring whitespace.



Note that, unlike the top answer, this treats the single quote as separate punctuation (e.g. "I'm" -> ['I', "'", 'm']). This appears to be standard in NLP, so I consider it a feature.






share|improve this answer



















  • 2





    Upvoted because the w+|[^ws] construct is more generic than the accepted answer but afaik in python 3 the re.UNICODE shouldn't be necessary

    – rloth
    Jan 5 '15 at 16:21



















5














In perl-style regular expression syntax, b matches a word boundary. This should come in handy for doing a regex-based split.



edit: I have been informed by hop that "empty matches" do not work in the split function of Python's re module. I will leave this here as information for anyone else getting stumped by this "feature".






share|improve this answer





















  • 1





    only it doesn't because re.split will not work with r'b'...

    – user3850
    Dec 15 '08 at 1:09











  • What the hell? Is that a bug in re.split? In Perl, split /bs*/ works without any problem.

    – Svante
    Dec 15 '08 at 1:29











  • it's kind of documented that re.split() won't split on empty matches... so, no, not /really/ a bug.

    – user3850
    Dec 15 '08 at 1:51






  • 1





    "kind of documented"? Even if it is really documented, it is still not helpful in any way, so I guess it is, in fact, a bug-redeclared-feature.

    – Svante
    Dec 15 '08 at 2:08











  • maybe. i don't know the rationale behind it. you should have checked whether it worked in any case! i cannot remove the downvote anymore, but please consider rewording the passive-aggressive edit -- doesn't help anyone.

    – user3850
    Dec 15 '08 at 9:16



















3














Here's my entry.



I have my doubts as to how well this will hold up in the sense of efficiency, or if it catches all cases (note the "!!!" grouped together; this may or may not be a good thing).



>>> import re
>>> import string
>>> s = "Helo, my name is Joe! and i live!!! in a button; factory:"
>>> l = [item for item in map(string.strip, re.split("(W+)", s)) if len(item) > 0]
>>> l
['Helo', ',', 'my', 'name', 'is', 'Joe', '!', 'and', 'i', 'live', '!!!', 'in', 'a', 'button', ';', 'factory', ':']
>>>


One obvious optimization would be to compile the regex before hand (using re.compile) if you're going to be doing this on a line-by-line basis.






share|improve this answer
























  • plus 1 for grouping punctuation.

    – UnsignedByte
    Apr 4 '18 at 3:27



















1














Here's a minor update to your implementation. If your trying to doing anything more detailed I suggest looking into the NLTK that le dorfier suggested.



This might only be a little faster since ''.join() is used in place of +=, which is known to be faster.



import string

d = "Hello, I'm a string!"

result =
word = ''

for char in d:
if char not in string.whitespace:
if char not in string.ascii_letters + "'":
if word:
result.append(word)
result.append(char)
word = ''
else:
word = ''.join([word,char])

else:
if word:
result.append(word)
word = ''
print result
['Hello', ',', "I'm", 'a', 'string', '!']





share|improve this answer
























  • i have not profiled this, but i guess the main problem is with the char-by-char concatenation of word. i'd instead use an index and slices.

    – user3850
    Dec 15 '08 at 10:24











  • With tricks i can shave 50% off the execution time of your solution. my solution with re.findall() is still twice as fast.

    – user3850
    Dec 15 '08 at 12:17






  • 1





    You need to call if word: result.append(word) after the loop ends, else the last word is not in result.

    – Roland Pihlakas
    May 25 '17 at 12:15



















0














I think you can find all the help you can imagine in the NLTK, especially since you are using python. There's a good comprehensive discussion of this issue in the tutorial.






share|improve this answer































    0














    I came up with a way to tokenize all words and W+ patterns using b which doesn't need hardcoding:



    >>> import re
    >>> sentence = 'Hello, world!'
    >>> tokens = [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', sentence)]
    ['Hello', ',', 'world', '!']


    Here .*?S.*? is a pattern matching anything that is not a space and $ is added to match last token in a string if it's a punctuation symbol.



    Note the following though -- this will group punctuation that consists of more than one symbol:



    >>> print [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', '"Oh no", she said')]
    ['Oh', 'no', '",', 'she', 'said']


    Of course, you can find and split such groups with:



    >>> for token in [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', '"You can", she said')]:
    ... print re.findall(r'(?:w+|W)', token)

    ['You']
    ['can']
    ['"', ',']
    ['she']
    ['said']





    share|improve this answer

































      0














      Try this:



      string_big = "One of Python's coolest features is the string format operator  This operator is unique to strings"
      my_list =
      x = len(string_big)
      poistion_ofspace = 0
      while poistion_ofspace < x:
      for i in range(poistion_ofspace,x):
      if string_big[i] == ' ':
      break
      else:
      continue
      print string_big[poistion_ofspace:(i+1)]
      my_list.append(string_big[poistion_ofspace:(i+1)])
      poistion_ofspace = i+1

      print my_list





      share|improve this answer

































        0














        If you are going to work in English (or some other common languages), you can use NLTK (there are many other tools to do this such as FreeLing).



        import nltk
        sentence = "help, me"
        nltk.word_tokenize(sentence)





        share|improve this answer

































          -1














          Have you tried using a regex?



          http://docs.python.org/library/re.html#re-syntax





          By the way. Why do you need the "," at the second one? You will know that after each text is written i.e.



          [0]



          ","



          [1]



          ","



          So if you want to add the "," you can just do it after each iteration when you use the array..






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f367155%2fsplitting-a-string-into-words-and-punctuation%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            10 Answers
            10






            active

            oldest

            votes








            10 Answers
            10






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            73














            This is more or less the way to do it:



            >>> import re
            >>> re.findall(r"[w']+|[.,!?;]", "Hello, I'm a string!")
            ['Hello', ',', "I'm", 'a', 'string', '!']


            The trick is, not to think about where to split the string, but what to include in the tokens.



            Caveats:




            • The underscore (_) is considered an inner-word character. Replace w, if you don't want that.

            • This will not work with (single) quotes in the string.

            • Put any additional punctuation marks you want to use in the right half of the regular expression.

            • Anything not explicitely mentioned in the re is silently dropped.






            share|improve this answer





















            • 2





              If you want to split at ANY punctuation, including ', try re.findall(r"[w]+|[^sw]", "Hello, I'm a string!"). The result is ['Hello', ',', 'I', "'", 'm', 'a', 'string', '!'] Note also that digits are included in the word match.

              – Codie CodeMonkey
              May 15 '12 at 8:21













            • Sorry! could you explain how exactly this is working?

              – Curious
              Feb 5 '16 at 2:36











            • @Curious: to be honest, no I coiuld not. Because, where should I start? What do you know? Which part is a problem for you? What do you want to achieve?

              – user3850
              Feb 5 '16 at 19:01











            • Never mind! I understood this myself! Thanks for the reply :)

              – Curious
              Feb 5 '16 at 20:39
















            73














            This is more or less the way to do it:



            >>> import re
            >>> re.findall(r"[w']+|[.,!?;]", "Hello, I'm a string!")
            ['Hello', ',', "I'm", 'a', 'string', '!']


            The trick is, not to think about where to split the string, but what to include in the tokens.



            Caveats:




            • The underscore (_) is considered an inner-word character. Replace w, if you don't want that.

            • This will not work with (single) quotes in the string.

            • Put any additional punctuation marks you want to use in the right half of the regular expression.

            • Anything not explicitely mentioned in the re is silently dropped.






            share|improve this answer





















            • 2





              If you want to split at ANY punctuation, including ', try re.findall(r"[w]+|[^sw]", "Hello, I'm a string!"). The result is ['Hello', ',', 'I', "'", 'm', 'a', 'string', '!'] Note also that digits are included in the word match.

              – Codie CodeMonkey
              May 15 '12 at 8:21













            • Sorry! could you explain how exactly this is working?

              – Curious
              Feb 5 '16 at 2:36











            • @Curious: to be honest, no I coiuld not. Because, where should I start? What do you know? Which part is a problem for you? What do you want to achieve?

              – user3850
              Feb 5 '16 at 19:01











            • Never mind! I understood this myself! Thanks for the reply :)

              – Curious
              Feb 5 '16 at 20:39














            73












            73








            73







            This is more or less the way to do it:



            >>> import re
            >>> re.findall(r"[w']+|[.,!?;]", "Hello, I'm a string!")
            ['Hello', ',', "I'm", 'a', 'string', '!']


            The trick is, not to think about where to split the string, but what to include in the tokens.



            Caveats:




            • The underscore (_) is considered an inner-word character. Replace w, if you don't want that.

            • This will not work with (single) quotes in the string.

            • Put any additional punctuation marks you want to use in the right half of the regular expression.

            • Anything not explicitely mentioned in the re is silently dropped.






            share|improve this answer















            This is more or less the way to do it:



            >>> import re
            >>> re.findall(r"[w']+|[.,!?;]", "Hello, I'm a string!")
            ['Hello', ',', "I'm", 'a', 'string', '!']


            The trick is, not to think about where to split the string, but what to include in the tokens.



            Caveats:




            • The underscore (_) is considered an inner-word character. Replace w, if you don't want that.

            • This will not work with (single) quotes in the string.

            • Put any additional punctuation marks you want to use in the right half of the regular expression.

            • Anything not explicitely mentioned in the re is silently dropped.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Aug 19 '11 at 23:01

























            answered Dec 15 '08 at 1:53







            user3850















            • 2





              If you want to split at ANY punctuation, including ', try re.findall(r"[w]+|[^sw]", "Hello, I'm a string!"). The result is ['Hello', ',', 'I', "'", 'm', 'a', 'string', '!'] Note also that digits are included in the word match.

              – Codie CodeMonkey
              May 15 '12 at 8:21













            • Sorry! could you explain how exactly this is working?

              – Curious
              Feb 5 '16 at 2:36











            • @Curious: to be honest, no I coiuld not. Because, where should I start? What do you know? Which part is a problem for you? What do you want to achieve?

              – user3850
              Feb 5 '16 at 19:01











            • Never mind! I understood this myself! Thanks for the reply :)

              – Curious
              Feb 5 '16 at 20:39














            • 2





              If you want to split at ANY punctuation, including ', try re.findall(r"[w]+|[^sw]", "Hello, I'm a string!"). The result is ['Hello', ',', 'I', "'", 'm', 'a', 'string', '!'] Note also that digits are included in the word match.

              – Codie CodeMonkey
              May 15 '12 at 8:21













            • Sorry! could you explain how exactly this is working?

              – Curious
              Feb 5 '16 at 2:36











            • @Curious: to be honest, no I coiuld not. Because, where should I start? What do you know? Which part is a problem for you? What do you want to achieve?

              – user3850
              Feb 5 '16 at 19:01











            • Never mind! I understood this myself! Thanks for the reply :)

              – Curious
              Feb 5 '16 at 20:39








            2




            2





            If you want to split at ANY punctuation, including ', try re.findall(r"[w]+|[^sw]", "Hello, I'm a string!"). The result is ['Hello', ',', 'I', "'", 'm', 'a', 'string', '!'] Note also that digits are included in the word match.

            – Codie CodeMonkey
            May 15 '12 at 8:21







            If you want to split at ANY punctuation, including ', try re.findall(r"[w]+|[^sw]", "Hello, I'm a string!"). The result is ['Hello', ',', 'I', "'", 'm', 'a', 'string', '!'] Note also that digits are included in the word match.

            – Codie CodeMonkey
            May 15 '12 at 8:21















            Sorry! could you explain how exactly this is working?

            – Curious
            Feb 5 '16 at 2:36





            Sorry! could you explain how exactly this is working?

            – Curious
            Feb 5 '16 at 2:36













            @Curious: to be honest, no I coiuld not. Because, where should I start? What do you know? Which part is a problem for you? What do you want to achieve?

            – user3850
            Feb 5 '16 at 19:01





            @Curious: to be honest, no I coiuld not. Because, where should I start? What do you know? Which part is a problem for you? What do you want to achieve?

            – user3850
            Feb 5 '16 at 19:01













            Never mind! I understood this myself! Thanks for the reply :)

            – Curious
            Feb 5 '16 at 20:39





            Never mind! I understood this myself! Thanks for the reply :)

            – Curious
            Feb 5 '16 at 20:39













            30














            Here is a Unicode-aware version:



            re.findall(r"w+|[^ws]", text, re.UNICODE)


            The first alternative catches sequences of word characters (as defined by unicode, so "résumé" won't turn into ['r', 'sum']); the second catches individual non-word characters, ignoring whitespace.



            Note that, unlike the top answer, this treats the single quote as separate punctuation (e.g. "I'm" -> ['I', "'", 'm']). This appears to be standard in NLP, so I consider it a feature.






            share|improve this answer



















            • 2





              Upvoted because the w+|[^ws] construct is more generic than the accepted answer but afaik in python 3 the re.UNICODE shouldn't be necessary

              – rloth
              Jan 5 '15 at 16:21
















            30














            Here is a Unicode-aware version:



            re.findall(r"w+|[^ws]", text, re.UNICODE)


            The first alternative catches sequences of word characters (as defined by unicode, so "résumé" won't turn into ['r', 'sum']); the second catches individual non-word characters, ignoring whitespace.



            Note that, unlike the top answer, this treats the single quote as separate punctuation (e.g. "I'm" -> ['I', "'", 'm']). This appears to be standard in NLP, so I consider it a feature.






            share|improve this answer



















            • 2





              Upvoted because the w+|[^ws] construct is more generic than the accepted answer but afaik in python 3 the re.UNICODE shouldn't be necessary

              – rloth
              Jan 5 '15 at 16:21














            30












            30








            30







            Here is a Unicode-aware version:



            re.findall(r"w+|[^ws]", text, re.UNICODE)


            The first alternative catches sequences of word characters (as defined by unicode, so "résumé" won't turn into ['r', 'sum']); the second catches individual non-word characters, ignoring whitespace.



            Note that, unlike the top answer, this treats the single quote as separate punctuation (e.g. "I'm" -> ['I', "'", 'm']). This appears to be standard in NLP, so I consider it a feature.






            share|improve this answer













            Here is a Unicode-aware version:



            re.findall(r"w+|[^ws]", text, re.UNICODE)


            The first alternative catches sequences of word characters (as defined by unicode, so "résumé" won't turn into ['r', 'sum']); the second catches individual non-word characters, ignoring whitespace.



            Note that, unlike the top answer, this treats the single quote as separate punctuation (e.g. "I'm" -> ['I', "'", 'm']). This appears to be standard in NLP, so I consider it a feature.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jan 19 '12 at 17:58









            LaCLaC

            10.4k53138




            10.4k53138








            • 2





              Upvoted because the w+|[^ws] construct is more generic than the accepted answer but afaik in python 3 the re.UNICODE shouldn't be necessary

              – rloth
              Jan 5 '15 at 16:21














            • 2





              Upvoted because the w+|[^ws] construct is more generic than the accepted answer but afaik in python 3 the re.UNICODE shouldn't be necessary

              – rloth
              Jan 5 '15 at 16:21








            2




            2





            Upvoted because the w+|[^ws] construct is more generic than the accepted answer but afaik in python 3 the re.UNICODE shouldn't be necessary

            – rloth
            Jan 5 '15 at 16:21





            Upvoted because the w+|[^ws] construct is more generic than the accepted answer but afaik in python 3 the re.UNICODE shouldn't be necessary

            – rloth
            Jan 5 '15 at 16:21











            5














            In perl-style regular expression syntax, b matches a word boundary. This should come in handy for doing a regex-based split.



            edit: I have been informed by hop that "empty matches" do not work in the split function of Python's re module. I will leave this here as information for anyone else getting stumped by this "feature".






            share|improve this answer





















            • 1





              only it doesn't because re.split will not work with r'b'...

              – user3850
              Dec 15 '08 at 1:09











            • What the hell? Is that a bug in re.split? In Perl, split /bs*/ works without any problem.

              – Svante
              Dec 15 '08 at 1:29











            • it's kind of documented that re.split() won't split on empty matches... so, no, not /really/ a bug.

              – user3850
              Dec 15 '08 at 1:51






            • 1





              "kind of documented"? Even if it is really documented, it is still not helpful in any way, so I guess it is, in fact, a bug-redeclared-feature.

              – Svante
              Dec 15 '08 at 2:08











            • maybe. i don't know the rationale behind it. you should have checked whether it worked in any case! i cannot remove the downvote anymore, but please consider rewording the passive-aggressive edit -- doesn't help anyone.

              – user3850
              Dec 15 '08 at 9:16
















            5














            In perl-style regular expression syntax, b matches a word boundary. This should come in handy for doing a regex-based split.



            edit: I have been informed by hop that "empty matches" do not work in the split function of Python's re module. I will leave this here as information for anyone else getting stumped by this "feature".






            share|improve this answer





















            • 1





              only it doesn't because re.split will not work with r'b'...

              – user3850
              Dec 15 '08 at 1:09











            • What the hell? Is that a bug in re.split? In Perl, split /bs*/ works without any problem.

              – Svante
              Dec 15 '08 at 1:29











            • it's kind of documented that re.split() won't split on empty matches... so, no, not /really/ a bug.

              – user3850
              Dec 15 '08 at 1:51






            • 1





              "kind of documented"? Even if it is really documented, it is still not helpful in any way, so I guess it is, in fact, a bug-redeclared-feature.

              – Svante
              Dec 15 '08 at 2:08











            • maybe. i don't know the rationale behind it. you should have checked whether it worked in any case! i cannot remove the downvote anymore, but please consider rewording the passive-aggressive edit -- doesn't help anyone.

              – user3850
              Dec 15 '08 at 9:16














            5












            5








            5







            In perl-style regular expression syntax, b matches a word boundary. This should come in handy for doing a regex-based split.



            edit: I have been informed by hop that "empty matches" do not work in the split function of Python's re module. I will leave this here as information for anyone else getting stumped by this "feature".






            share|improve this answer















            In perl-style regular expression syntax, b matches a word boundary. This should come in handy for doing a regex-based split.



            edit: I have been informed by hop that "empty matches" do not work in the split function of Python's re module. I will leave this here as information for anyone else getting stumped by this "feature".







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Dec 15 '08 at 9:41

























            answered Dec 15 '08 at 0:25









            SvanteSvante

            40k664111




            40k664111








            • 1





              only it doesn't because re.split will not work with r'b'...

              – user3850
              Dec 15 '08 at 1:09











            • What the hell? Is that a bug in re.split? In Perl, split /bs*/ works without any problem.

              – Svante
              Dec 15 '08 at 1:29











            • it's kind of documented that re.split() won't split on empty matches... so, no, not /really/ a bug.

              – user3850
              Dec 15 '08 at 1:51






            • 1





              "kind of documented"? Even if it is really documented, it is still not helpful in any way, so I guess it is, in fact, a bug-redeclared-feature.

              – Svante
              Dec 15 '08 at 2:08











            • maybe. i don't know the rationale behind it. you should have checked whether it worked in any case! i cannot remove the downvote anymore, but please consider rewording the passive-aggressive edit -- doesn't help anyone.

              – user3850
              Dec 15 '08 at 9:16














            • 1





              only it doesn't because re.split will not work with r'b'...

              – user3850
              Dec 15 '08 at 1:09











            • What the hell? Is that a bug in re.split? In Perl, split /bs*/ works without any problem.

              – Svante
              Dec 15 '08 at 1:29











            • it's kind of documented that re.split() won't split on empty matches... so, no, not /really/ a bug.

              – user3850
              Dec 15 '08 at 1:51






            • 1





              "kind of documented"? Even if it is really documented, it is still not helpful in any way, so I guess it is, in fact, a bug-redeclared-feature.

              – Svante
              Dec 15 '08 at 2:08











            • maybe. i don't know the rationale behind it. you should have checked whether it worked in any case! i cannot remove the downvote anymore, but please consider rewording the passive-aggressive edit -- doesn't help anyone.

              – user3850
              Dec 15 '08 at 9:16








            1




            1





            only it doesn't because re.split will not work with r'b'...

            – user3850
            Dec 15 '08 at 1:09





            only it doesn't because re.split will not work with r'b'...

            – user3850
            Dec 15 '08 at 1:09













            What the hell? Is that a bug in re.split? In Perl, split /bs*/ works without any problem.

            – Svante
            Dec 15 '08 at 1:29





            What the hell? Is that a bug in re.split? In Perl, split /bs*/ works without any problem.

            – Svante
            Dec 15 '08 at 1:29













            it's kind of documented that re.split() won't split on empty matches... so, no, not /really/ a bug.

            – user3850
            Dec 15 '08 at 1:51





            it's kind of documented that re.split() won't split on empty matches... so, no, not /really/ a bug.

            – user3850
            Dec 15 '08 at 1:51




            1




            1





            "kind of documented"? Even if it is really documented, it is still not helpful in any way, so I guess it is, in fact, a bug-redeclared-feature.

            – Svante
            Dec 15 '08 at 2:08





            "kind of documented"? Even if it is really documented, it is still not helpful in any way, so I guess it is, in fact, a bug-redeclared-feature.

            – Svante
            Dec 15 '08 at 2:08













            maybe. i don't know the rationale behind it. you should have checked whether it worked in any case! i cannot remove the downvote anymore, but please consider rewording the passive-aggressive edit -- doesn't help anyone.

            – user3850
            Dec 15 '08 at 9:16





            maybe. i don't know the rationale behind it. you should have checked whether it worked in any case! i cannot remove the downvote anymore, but please consider rewording the passive-aggressive edit -- doesn't help anyone.

            – user3850
            Dec 15 '08 at 9:16











            3














            Here's my entry.



            I have my doubts as to how well this will hold up in the sense of efficiency, or if it catches all cases (note the "!!!" grouped together; this may or may not be a good thing).



            >>> import re
            >>> import string
            >>> s = "Helo, my name is Joe! and i live!!! in a button; factory:"
            >>> l = [item for item in map(string.strip, re.split("(W+)", s)) if len(item) > 0]
            >>> l
            ['Helo', ',', 'my', 'name', 'is', 'Joe', '!', 'and', 'i', 'live', '!!!', 'in', 'a', 'button', ';', 'factory', ':']
            >>>


            One obvious optimization would be to compile the regex before hand (using re.compile) if you're going to be doing this on a line-by-line basis.






            share|improve this answer
























            • plus 1 for grouping punctuation.

              – UnsignedByte
              Apr 4 '18 at 3:27
















            3














            Here's my entry.



            I have my doubts as to how well this will hold up in the sense of efficiency, or if it catches all cases (note the "!!!" grouped together; this may or may not be a good thing).



            >>> import re
            >>> import string
            >>> s = "Helo, my name is Joe! and i live!!! in a button; factory:"
            >>> l = [item for item in map(string.strip, re.split("(W+)", s)) if len(item) > 0]
            >>> l
            ['Helo', ',', 'my', 'name', 'is', 'Joe', '!', 'and', 'i', 'live', '!!!', 'in', 'a', 'button', ';', 'factory', ':']
            >>>


            One obvious optimization would be to compile the regex before hand (using re.compile) if you're going to be doing this on a line-by-line basis.






            share|improve this answer
























            • plus 1 for grouping punctuation.

              – UnsignedByte
              Apr 4 '18 at 3:27














            3












            3








            3







            Here's my entry.



            I have my doubts as to how well this will hold up in the sense of efficiency, or if it catches all cases (note the "!!!" grouped together; this may or may not be a good thing).



            >>> import re
            >>> import string
            >>> s = "Helo, my name is Joe! and i live!!! in a button; factory:"
            >>> l = [item for item in map(string.strip, re.split("(W+)", s)) if len(item) > 0]
            >>> l
            ['Helo', ',', 'my', 'name', 'is', 'Joe', '!', 'and', 'i', 'live', '!!!', 'in', 'a', 'button', ';', 'factory', ':']
            >>>


            One obvious optimization would be to compile the regex before hand (using re.compile) if you're going to be doing this on a line-by-line basis.






            share|improve this answer













            Here's my entry.



            I have my doubts as to how well this will hold up in the sense of efficiency, or if it catches all cases (note the "!!!" grouped together; this may or may not be a good thing).



            >>> import re
            >>> import string
            >>> s = "Helo, my name is Joe! and i live!!! in a button; factory:"
            >>> l = [item for item in map(string.strip, re.split("(W+)", s)) if len(item) > 0]
            >>> l
            ['Helo', ',', 'my', 'name', 'is', 'Joe', '!', 'and', 'i', 'live', '!!!', 'in', 'a', 'button', ';', 'factory', ':']
            >>>


            One obvious optimization would be to compile the regex before hand (using re.compile) if you're going to be doing this on a line-by-line basis.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 15 '08 at 1:30









            Chris CameronChris Cameron

            6,33332646




            6,33332646













            • plus 1 for grouping punctuation.

              – UnsignedByte
              Apr 4 '18 at 3:27



















            • plus 1 for grouping punctuation.

              – UnsignedByte
              Apr 4 '18 at 3:27

















            plus 1 for grouping punctuation.

            – UnsignedByte
            Apr 4 '18 at 3:27





            plus 1 for grouping punctuation.

            – UnsignedByte
            Apr 4 '18 at 3:27











            1














            Here's a minor update to your implementation. If your trying to doing anything more detailed I suggest looking into the NLTK that le dorfier suggested.



            This might only be a little faster since ''.join() is used in place of +=, which is known to be faster.



            import string

            d = "Hello, I'm a string!"

            result =
            word = ''

            for char in d:
            if char not in string.whitespace:
            if char not in string.ascii_letters + "'":
            if word:
            result.append(word)
            result.append(char)
            word = ''
            else:
            word = ''.join([word,char])

            else:
            if word:
            result.append(word)
            word = ''
            print result
            ['Hello', ',', "I'm", 'a', 'string', '!']





            share|improve this answer
























            • i have not profiled this, but i guess the main problem is with the char-by-char concatenation of word. i'd instead use an index and slices.

              – user3850
              Dec 15 '08 at 10:24











            • With tricks i can shave 50% off the execution time of your solution. my solution with re.findall() is still twice as fast.

              – user3850
              Dec 15 '08 at 12:17






            • 1





              You need to call if word: result.append(word) after the loop ends, else the last word is not in result.

              – Roland Pihlakas
              May 25 '17 at 12:15
















            1














            Here's a minor update to your implementation. If your trying to doing anything more detailed I suggest looking into the NLTK that le dorfier suggested.



            This might only be a little faster since ''.join() is used in place of +=, which is known to be faster.



            import string

            d = "Hello, I'm a string!"

            result =
            word = ''

            for char in d:
            if char not in string.whitespace:
            if char not in string.ascii_letters + "'":
            if word:
            result.append(word)
            result.append(char)
            word = ''
            else:
            word = ''.join([word,char])

            else:
            if word:
            result.append(word)
            word = ''
            print result
            ['Hello', ',', "I'm", 'a', 'string', '!']





            share|improve this answer
























            • i have not profiled this, but i guess the main problem is with the char-by-char concatenation of word. i'd instead use an index and slices.

              – user3850
              Dec 15 '08 at 10:24











            • With tricks i can shave 50% off the execution time of your solution. my solution with re.findall() is still twice as fast.

              – user3850
              Dec 15 '08 at 12:17






            • 1





              You need to call if word: result.append(word) after the loop ends, else the last word is not in result.

              – Roland Pihlakas
              May 25 '17 at 12:15














            1












            1








            1







            Here's a minor update to your implementation. If your trying to doing anything more detailed I suggest looking into the NLTK that le dorfier suggested.



            This might only be a little faster since ''.join() is used in place of +=, which is known to be faster.



            import string

            d = "Hello, I'm a string!"

            result =
            word = ''

            for char in d:
            if char not in string.whitespace:
            if char not in string.ascii_letters + "'":
            if word:
            result.append(word)
            result.append(char)
            word = ''
            else:
            word = ''.join([word,char])

            else:
            if word:
            result.append(word)
            word = ''
            print result
            ['Hello', ',', "I'm", 'a', 'string', '!']





            share|improve this answer













            Here's a minor update to your implementation. If your trying to doing anything more detailed I suggest looking into the NLTK that le dorfier suggested.



            This might only be a little faster since ''.join() is used in place of +=, which is known to be faster.



            import string

            d = "Hello, I'm a string!"

            result =
            word = ''

            for char in d:
            if char not in string.whitespace:
            if char not in string.ascii_letters + "'":
            if word:
            result.append(word)
            result.append(char)
            word = ''
            else:
            word = ''.join([word,char])

            else:
            if word:
            result.append(word)
            word = ''
            print result
            ['Hello', ',', "I'm", 'a', 'string', '!']






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 15 '08 at 1:05









            monkutmonkut

            26k1987125




            26k1987125













            • i have not profiled this, but i guess the main problem is with the char-by-char concatenation of word. i'd instead use an index and slices.

              – user3850
              Dec 15 '08 at 10:24











            • With tricks i can shave 50% off the execution time of your solution. my solution with re.findall() is still twice as fast.

              – user3850
              Dec 15 '08 at 12:17






            • 1





              You need to call if word: result.append(word) after the loop ends, else the last word is not in result.

              – Roland Pihlakas
              May 25 '17 at 12:15



















            • i have not profiled this, but i guess the main problem is with the char-by-char concatenation of word. i'd instead use an index and slices.

              – user3850
              Dec 15 '08 at 10:24











            • With tricks i can shave 50% off the execution time of your solution. my solution with re.findall() is still twice as fast.

              – user3850
              Dec 15 '08 at 12:17






            • 1





              You need to call if word: result.append(word) after the loop ends, else the last word is not in result.

              – Roland Pihlakas
              May 25 '17 at 12:15

















            i have not profiled this, but i guess the main problem is with the char-by-char concatenation of word. i'd instead use an index and slices.

            – user3850
            Dec 15 '08 at 10:24





            i have not profiled this, but i guess the main problem is with the char-by-char concatenation of word. i'd instead use an index and slices.

            – user3850
            Dec 15 '08 at 10:24













            With tricks i can shave 50% off the execution time of your solution. my solution with re.findall() is still twice as fast.

            – user3850
            Dec 15 '08 at 12:17





            With tricks i can shave 50% off the execution time of your solution. my solution with re.findall() is still twice as fast.

            – user3850
            Dec 15 '08 at 12:17




            1




            1





            You need to call if word: result.append(word) after the loop ends, else the last word is not in result.

            – Roland Pihlakas
            May 25 '17 at 12:15





            You need to call if word: result.append(word) after the loop ends, else the last word is not in result.

            – Roland Pihlakas
            May 25 '17 at 12:15











            0














            I think you can find all the help you can imagine in the NLTK, especially since you are using python. There's a good comprehensive discussion of this issue in the tutorial.






            share|improve this answer




























              0














              I think you can find all the help you can imagine in the NLTK, especially since you are using python. There's a good comprehensive discussion of this issue in the tutorial.






              share|improve this answer


























                0












                0








                0







                I think you can find all the help you can imagine in the NLTK, especially since you are using python. There's a good comprehensive discussion of this issue in the tutorial.






                share|improve this answer













                I think you can find all the help you can imagine in the NLTK, especially since you are using python. There's a good comprehensive discussion of this issue in the tutorial.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Dec 15 '08 at 0:34









                dkretzdkretz

                33k1373130




                33k1373130























                    0














                    I came up with a way to tokenize all words and W+ patterns using b which doesn't need hardcoding:



                    >>> import re
                    >>> sentence = 'Hello, world!'
                    >>> tokens = [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', sentence)]
                    ['Hello', ',', 'world', '!']


                    Here .*?S.*? is a pattern matching anything that is not a space and $ is added to match last token in a string if it's a punctuation symbol.



                    Note the following though -- this will group punctuation that consists of more than one symbol:



                    >>> print [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', '"Oh no", she said')]
                    ['Oh', 'no', '",', 'she', 'said']


                    Of course, you can find and split such groups with:



                    >>> for token in [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', '"You can", she said')]:
                    ... print re.findall(r'(?:w+|W)', token)

                    ['You']
                    ['can']
                    ['"', ',']
                    ['she']
                    ['said']





                    share|improve this answer






























                      0














                      I came up with a way to tokenize all words and W+ patterns using b which doesn't need hardcoding:



                      >>> import re
                      >>> sentence = 'Hello, world!'
                      >>> tokens = [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', sentence)]
                      ['Hello', ',', 'world', '!']


                      Here .*?S.*? is a pattern matching anything that is not a space and $ is added to match last token in a string if it's a punctuation symbol.



                      Note the following though -- this will group punctuation that consists of more than one symbol:



                      >>> print [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', '"Oh no", she said')]
                      ['Oh', 'no', '",', 'she', 'said']


                      Of course, you can find and split such groups with:



                      >>> for token in [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', '"You can", she said')]:
                      ... print re.findall(r'(?:w+|W)', token)

                      ['You']
                      ['can']
                      ['"', ',']
                      ['she']
                      ['said']





                      share|improve this answer




























                        0












                        0








                        0







                        I came up with a way to tokenize all words and W+ patterns using b which doesn't need hardcoding:



                        >>> import re
                        >>> sentence = 'Hello, world!'
                        >>> tokens = [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', sentence)]
                        ['Hello', ',', 'world', '!']


                        Here .*?S.*? is a pattern matching anything that is not a space and $ is added to match last token in a string if it's a punctuation symbol.



                        Note the following though -- this will group punctuation that consists of more than one symbol:



                        >>> print [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', '"Oh no", she said')]
                        ['Oh', 'no', '",', 'she', 'said']


                        Of course, you can find and split such groups with:



                        >>> for token in [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', '"You can", she said')]:
                        ... print re.findall(r'(?:w+|W)', token)

                        ['You']
                        ['can']
                        ['"', ',']
                        ['she']
                        ['said']





                        share|improve this answer















                        I came up with a way to tokenize all words and W+ patterns using b which doesn't need hardcoding:



                        >>> import re
                        >>> sentence = 'Hello, world!'
                        >>> tokens = [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', sentence)]
                        ['Hello', ',', 'world', '!']


                        Here .*?S.*? is a pattern matching anything that is not a space and $ is added to match last token in a string if it's a punctuation symbol.



                        Note the following though -- this will group punctuation that consists of more than one symbol:



                        >>> print [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', '"Oh no", she said')]
                        ['Oh', 'no', '",', 'she', 'said']


                        Of course, you can find and split such groups with:



                        >>> for token in [t.strip() for t in re.findall(r'b.*?S.*?(?:b|$)', '"You can", she said')]:
                        ... print re.findall(r'(?:w+|W)', token)

                        ['You']
                        ['can']
                        ['"', ',']
                        ['she']
                        ['said']






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Apr 15 '14 at 19:16

























                        answered Apr 15 '14 at 19:11









                        FrauHahnhenFrauHahnhen

                        588




                        588























                            0














                            Try this:



                            string_big = "One of Python's coolest features is the string format operator  This operator is unique to strings"
                            my_list =
                            x = len(string_big)
                            poistion_ofspace = 0
                            while poistion_ofspace < x:
                            for i in range(poistion_ofspace,x):
                            if string_big[i] == ' ':
                            break
                            else:
                            continue
                            print string_big[poistion_ofspace:(i+1)]
                            my_list.append(string_big[poistion_ofspace:(i+1)])
                            poistion_ofspace = i+1

                            print my_list





                            share|improve this answer






























                              0














                              Try this:



                              string_big = "One of Python's coolest features is the string format operator  This operator is unique to strings"
                              my_list =
                              x = len(string_big)
                              poistion_ofspace = 0
                              while poistion_ofspace < x:
                              for i in range(poistion_ofspace,x):
                              if string_big[i] == ' ':
                              break
                              else:
                              continue
                              print string_big[poistion_ofspace:(i+1)]
                              my_list.append(string_big[poistion_ofspace:(i+1)])
                              poistion_ofspace = i+1

                              print my_list





                              share|improve this answer




























                                0












                                0








                                0







                                Try this:



                                string_big = "One of Python's coolest features is the string format operator  This operator is unique to strings"
                                my_list =
                                x = len(string_big)
                                poistion_ofspace = 0
                                while poistion_ofspace < x:
                                for i in range(poistion_ofspace,x):
                                if string_big[i] == ' ':
                                break
                                else:
                                continue
                                print string_big[poistion_ofspace:(i+1)]
                                my_list.append(string_big[poistion_ofspace:(i+1)])
                                poistion_ofspace = i+1

                                print my_list





                                share|improve this answer















                                Try this:



                                string_big = "One of Python's coolest features is the string format operator  This operator is unique to strings"
                                my_list =
                                x = len(string_big)
                                poistion_ofspace = 0
                                while poistion_ofspace < x:
                                for i in range(poistion_ofspace,x):
                                if string_big[i] == ' ':
                                break
                                else:
                                continue
                                print string_big[poistion_ofspace:(i+1)]
                                my_list.append(string_big[poistion_ofspace:(i+1)])
                                poistion_ofspace = i+1

                                print my_list






                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Apr 18 '17 at 9:28









                                Aurasphere

                                2,519102950




                                2,519102950










                                answered Apr 18 '17 at 9:03









                                Siddharth SononeSiddharth Sonone

                                6019




                                6019























                                    0














                                    If you are going to work in English (or some other common languages), you can use NLTK (there are many other tools to do this such as FreeLing).



                                    import nltk
                                    sentence = "help, me"
                                    nltk.word_tokenize(sentence)





                                    share|improve this answer






























                                      0














                                      If you are going to work in English (or some other common languages), you can use NLTK (there are many other tools to do this such as FreeLing).



                                      import nltk
                                      sentence = "help, me"
                                      nltk.word_tokenize(sentence)





                                      share|improve this answer




























                                        0












                                        0








                                        0







                                        If you are going to work in English (or some other common languages), you can use NLTK (there are many other tools to do this such as FreeLing).



                                        import nltk
                                        sentence = "help, me"
                                        nltk.word_tokenize(sentence)





                                        share|improve this answer















                                        If you are going to work in English (or some other common languages), you can use NLTK (there are many other tools to do this such as FreeLing).



                                        import nltk
                                        sentence = "help, me"
                                        nltk.word_tokenize(sentence)






                                        share|improve this answer














                                        share|improve this answer



                                        share|improve this answer








                                        edited Nov 9 '18 at 10:30

























                                        answered Nov 8 '18 at 16:16









                                        Fernando S. PeregrinoFernando S. Peregrino

                                        618




                                        618























                                            -1














                                            Have you tried using a regex?



                                            http://docs.python.org/library/re.html#re-syntax





                                            By the way. Why do you need the "," at the second one? You will know that after each text is written i.e.



                                            [0]



                                            ","



                                            [1]



                                            ","



                                            So if you want to add the "," you can just do it after each iteration when you use the array..






                                            share|improve this answer




























                                              -1














                                              Have you tried using a regex?



                                              http://docs.python.org/library/re.html#re-syntax





                                              By the way. Why do you need the "," at the second one? You will know that after each text is written i.e.



                                              [0]



                                              ","



                                              [1]



                                              ","



                                              So if you want to add the "," you can just do it after each iteration when you use the array..






                                              share|improve this answer


























                                                -1












                                                -1








                                                -1







                                                Have you tried using a regex?



                                                http://docs.python.org/library/re.html#re-syntax





                                                By the way. Why do you need the "," at the second one? You will know that after each text is written i.e.



                                                [0]



                                                ","



                                                [1]



                                                ","



                                                So if you want to add the "," you can just do it after each iteration when you use the array..






                                                share|improve this answer













                                                Have you tried using a regex?



                                                http://docs.python.org/library/re.html#re-syntax





                                                By the way. Why do you need the "," at the second one? You will know that after each text is written i.e.



                                                [0]



                                                ","



                                                [1]



                                                ","



                                                So if you want to add the "," you can just do it after each iteration when you use the array..







                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered Dec 14 '08 at 23:34









                                                Filip EkbergFilip Ekberg

                                                29.8k18107175




                                                29.8k18107175






























                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Stack Overflow!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f367155%2fsplitting-a-string-into-words-and-punctuation%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Wiesbaden

                                                    Marschland

                                                    Dieringhausen