How to get specific fields data from string without field labels fetched from image by using tesseract-ocr

I have written a program that ask for the user to upload the DMV license picture and reads the DMV License details from the uploaded picture using tesseract ocr. I've got the tesseract part work which does well to some extent. I have got a raw string and now I need to parse that string to fetch user details. The problem is that some fields on DMV license have no lables. Like name, address etc. I need to fetch these details. I can't come up with idea (may be I can use regex but don't know how to get it work?). If someone already did that I'd love to take a look. . Any suggestion would be welcome.

Image:
DMV LICENSE PICTURE

CODE

Here is the code to read the uploaded file and get text from image using tesseract.

    from django.http import HttpResponse

from django.views.decorators.csrf import csrf_exempt

try:

    from PIL import Image

except ImportError:

    import Image

import pytesseract



# replace the path with the path to tesseract installation directory on server.

pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"



@csrf_exempt

def upload_dmv(request):

    if request.method == "POST":

        dmv = request.FILES['dmv']

        extracted_data = pytesseract.image_to_string(Image.open(dmv))

        print(extracted_data)

    return HttpResponse(b'OK')

Output

W YORK STALE

DRIVER Ee C{ENeSae

876 071652



BOGADO

PETER,GIOVANNI.



9520 93RD ST FL 2

OZONE PARK, NY 114116



SexM_ Height6'-02" Eyes BRO:

00806/06/1992



Expires 06/06/2018

ENONE

RB



Issued 03/09/2017



Usa



~e-h.



Crecutive Deputy Comminsioner of Motor



Class E

edited Nov 22 '18 at 7:55

asked Nov 22 '18 at 7:03

root

24413

add a comment |

Image:
DMV LICENSE PICTURE

CODE

Here is the code to read the uploaded file and get text from image using tesseract.

    from django.http import HttpResponse

from django.views.decorators.csrf import csrf_exempt

try:

    from PIL import Image

except ImportError:

    import Image

import pytesseract



# replace the path with the path to tesseract installation directory on server.

pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"



@csrf_exempt

def upload_dmv(request):

    if request.method == "POST":

        dmv = request.FILES['dmv']

        extracted_data = pytesseract.image_to_string(Image.open(dmv))

        print(extracted_data)

    return HttpResponse(b'OK')

Output

W YORK STALE

DRIVER Ee C{ENeSae

876 071652



BOGADO

PETER,GIOVANNI.



9520 93RD ST FL 2

OZONE PARK, NY 114116



SexM_ Height6'-02" Eyes BRO:

00806/06/1992



Expires 06/06/2018

ENONE

RB



Issued 03/09/2017



Usa



~e-h.



Crecutive Deputy Comminsioner of Motor



Class E

edited Nov 22 '18 at 7:55

asked Nov 22 '18 at 7:03

root

24413

add a comment |

Image:
DMV LICENSE PICTURE

CODE

Here is the code to read the uploaded file and get text from image using tesseract.

    from django.http import HttpResponse

from django.views.decorators.csrf import csrf_exempt

try:

    from PIL import Image

except ImportError:

    import Image

import pytesseract



# replace the path with the path to tesseract installation directory on server.

pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"



@csrf_exempt

def upload_dmv(request):

    if request.method == "POST":

        dmv = request.FILES['dmv']

        extracted_data = pytesseract.image_to_string(Image.open(dmv))

        print(extracted_data)

    return HttpResponse(b'OK')

Output

W YORK STALE

DRIVER Ee C{ENeSae

876 071652



BOGADO

PETER,GIOVANNI.



9520 93RD ST FL 2

OZONE PARK, NY 114116



SexM_ Height6'-02" Eyes BRO:

00806/06/1992



Expires 06/06/2018

ENONE

RB



Issued 03/09/2017



Usa



~e-h.



Crecutive Deputy Comminsioner of Motor



Class E

edited Nov 22 '18 at 7:55

asked Nov 22 '18 at 7:03

root

24413

Image:
DMV LICENSE PICTURE

CODE

Here is the code to read the uploaded file and get text from image using tesseract.

    from django.http import HttpResponse

from django.views.decorators.csrf import csrf_exempt

try:

    from PIL import Image

except ImportError:

    import Image

import pytesseract



# replace the path with the path to tesseract installation directory on server.

pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"



@csrf_exempt

def upload_dmv(request):

    if request.method == "POST":

        dmv = request.FILES['dmv']

        extracted_data = pytesseract.image_to_string(Image.open(dmv))

        print(extracted_data)

    return HttpResponse(b'OK')

Output

W YORK STALE

DRIVER Ee C{ENeSae

876 071652



BOGADO

PETER,GIOVANNI.



9520 93RD ST FL 2

OZONE PARK, NY 114116



SexM_ Height6'-02" Eyes BRO:

00806/06/1992



Expires 06/06/2018

ENONE

RB



Issued 03/09/2017



Usa



~e-h.



Crecutive Deputy Comminsioner of Motor



Class E

regex python-2.7 tesseract string-parsing pytesseract

edited Nov 22 '18 at 7:55

asked Nov 22 '18 at 7:03

root

24413

edited Nov 22 '18 at 7:55

asked Nov 22 '18 at 7:03

root

24413

edited Nov 22 '18 at 7:55

asked Nov 22 '18 at 7:03

root

24413

asked Nov 22 '18 at 7:03

root

24413

asked Nov 22 '18 at 7:03

root

24413

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53425486%2fhow-to-get-specific-fields-data-from-string-without-field-labels-fetched-from-im%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytukyg