How to get specific fields data from string without field labels fetched from image by using tesseract-ocr












1















I have written a program that ask for the user to upload the DMV license picture and reads the DMV License details from the uploaded picture using tesseract ocr. I've got the tesseract part work which does well to some extent. I have got a raw string and now I need to parse that string to fetch user details. The problem is that some fields on DMV license have no lables. Like name, address etc. I need to fetch these details. I can't come up with idea (may be I can use regex but don't know how to get it work?). If someone already did that I'd love to take a look. . Any suggestion would be welcome.



Image:
DMV LICENSE PICTURE



CODE

Here is the code to read the uploaded file and get text from image using tesseract.



    from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt
try:
from PIL import Image
except ImportError:
import Image
import pytesseract

# replace the path with the path to tesseract installation directory on server.
pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"

@csrf_exempt
def upload_dmv(request):
if request.method == "POST":
dmv = request.FILES['dmv']
extracted_data = pytesseract.image_to_string(Image.open(dmv))
print(extracted_data)
return HttpResponse(b'OK')


Output



W YORK STALE
DRIVER Ee C{ENeSae
876 071652

BOGADO
PETER,GIOVANNI.

9520 93RD ST FL 2
OZONE PARK, NY 114116

SexM_ Height6'-02" Eyes BRO:
00806/06/1992

Expires 06/06/2018
ENONE
RB

Issued 03/09/2017

Usa

~e-h.

Crecutive Deputy Comminsioner of Motor

Class E









share|improve this question





























    1















    I have written a program that ask for the user to upload the DMV license picture and reads the DMV License details from the uploaded picture using tesseract ocr. I've got the tesseract part work which does well to some extent. I have got a raw string and now I need to parse that string to fetch user details. The problem is that some fields on DMV license have no lables. Like name, address etc. I need to fetch these details. I can't come up with idea (may be I can use regex but don't know how to get it work?). If someone already did that I'd love to take a look. . Any suggestion would be welcome.



    Image:
    DMV LICENSE PICTURE



    CODE

    Here is the code to read the uploaded file and get text from image using tesseract.



        from django.http import HttpResponse
    from django.views.decorators.csrf import csrf_exempt
    try:
    from PIL import Image
    except ImportError:
    import Image
    import pytesseract

    # replace the path with the path to tesseract installation directory on server.
    pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"

    @csrf_exempt
    def upload_dmv(request):
    if request.method == "POST":
    dmv = request.FILES['dmv']
    extracted_data = pytesseract.image_to_string(Image.open(dmv))
    print(extracted_data)
    return HttpResponse(b'OK')


    Output



    W YORK STALE
    DRIVER Ee C{ENeSae
    876 071652

    BOGADO
    PETER,GIOVANNI.

    9520 93RD ST FL 2
    OZONE PARK, NY 114116

    SexM_ Height6'-02" Eyes BRO:
    00806/06/1992

    Expires 06/06/2018
    ENONE
    RB

    Issued 03/09/2017

    Usa

    ~e-h.

    Crecutive Deputy Comminsioner of Motor

    Class E









    share|improve this question



























      1












      1








      1








      I have written a program that ask for the user to upload the DMV license picture and reads the DMV License details from the uploaded picture using tesseract ocr. I've got the tesseract part work which does well to some extent. I have got a raw string and now I need to parse that string to fetch user details. The problem is that some fields on DMV license have no lables. Like name, address etc. I need to fetch these details. I can't come up with idea (may be I can use regex but don't know how to get it work?). If someone already did that I'd love to take a look. . Any suggestion would be welcome.



      Image:
      DMV LICENSE PICTURE



      CODE

      Here is the code to read the uploaded file and get text from image using tesseract.



          from django.http import HttpResponse
      from django.views.decorators.csrf import csrf_exempt
      try:
      from PIL import Image
      except ImportError:
      import Image
      import pytesseract

      # replace the path with the path to tesseract installation directory on server.
      pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"

      @csrf_exempt
      def upload_dmv(request):
      if request.method == "POST":
      dmv = request.FILES['dmv']
      extracted_data = pytesseract.image_to_string(Image.open(dmv))
      print(extracted_data)
      return HttpResponse(b'OK')


      Output



      W YORK STALE
      DRIVER Ee C{ENeSae
      876 071652

      BOGADO
      PETER,GIOVANNI.

      9520 93RD ST FL 2
      OZONE PARK, NY 114116

      SexM_ Height6'-02" Eyes BRO:
      00806/06/1992

      Expires 06/06/2018
      ENONE
      RB

      Issued 03/09/2017

      Usa

      ~e-h.

      Crecutive Deputy Comminsioner of Motor

      Class E









      share|improve this question
















      I have written a program that ask for the user to upload the DMV license picture and reads the DMV License details from the uploaded picture using tesseract ocr. I've got the tesseract part work which does well to some extent. I have got a raw string and now I need to parse that string to fetch user details. The problem is that some fields on DMV license have no lables. Like name, address etc. I need to fetch these details. I can't come up with idea (may be I can use regex but don't know how to get it work?). If someone already did that I'd love to take a look. . Any suggestion would be welcome.



      Image:
      DMV LICENSE PICTURE



      CODE

      Here is the code to read the uploaded file and get text from image using tesseract.



          from django.http import HttpResponse
      from django.views.decorators.csrf import csrf_exempt
      try:
      from PIL import Image
      except ImportError:
      import Image
      import pytesseract

      # replace the path with the path to tesseract installation directory on server.
      pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"

      @csrf_exempt
      def upload_dmv(request):
      if request.method == "POST":
      dmv = request.FILES['dmv']
      extracted_data = pytesseract.image_to_string(Image.open(dmv))
      print(extracted_data)
      return HttpResponse(b'OK')


      Output



      W YORK STALE
      DRIVER Ee C{ENeSae
      876 071652

      BOGADO
      PETER,GIOVANNI.

      9520 93RD ST FL 2
      OZONE PARK, NY 114116

      SexM_ Height6'-02" Eyes BRO:
      00806/06/1992

      Expires 06/06/2018
      ENONE
      RB

      Issued 03/09/2017

      Usa

      ~e-h.

      Crecutive Deputy Comminsioner of Motor

      Class E






      regex python-2.7 tesseract string-parsing pytesseract






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 '18 at 7:55







      root

















      asked Nov 22 '18 at 7:03









      rootroot

      24413




      24413
























          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53425486%2fhow-to-get-specific-fields-data-from-string-without-field-labels-fetched-from-im%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53425486%2fhow-to-get-specific-fields-data-from-string-without-field-labels-fetched-from-im%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          To store a contact into the json file from server.js file using a class in NodeJS

          Redirect URL with Chrome Remote Debugging Android Devices

          Dieringhausen