How to get specific fields data from string without field labels fetched from image by using tesseract-ocr
I have written a program that ask for the user to upload the DMV license picture and reads the DMV License details from the uploaded picture using tesseract ocr. I've got the tesseract part work which does well to some extent. I have got a raw string and now I need to parse that string to fetch user details. The problem is that some fields on DMV license have no lables. Like name, address etc. I need to fetch these details. I can't come up with idea (may be I can use regex but don't know how to get it work?). If someone already did that I'd love to take a look. . Any suggestion would be welcome.
Image:
CODE
Here is the code to read the uploaded file and get text from image using tesseract.
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
# replace the path with the path to tesseract installation directory on server.
pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"
@csrf_exempt
def upload_dmv(request):
if request.method == "POST":
dmv = request.FILES['dmv']
extracted_data = pytesseract.image_to_string(Image.open(dmv))
print(extracted_data)
return HttpResponse(b'OK')
Output
W YORK STALE
DRIVER Ee C{ENeSae
876 071652
BOGADO
PETER,GIOVANNI.
9520 93RD ST FL 2
OZONE PARK, NY 114116
SexM_ Height6'-02" Eyes BRO:
00806/06/1992
Expires 06/06/2018
ENONE
RB
Issued 03/09/2017
Usa
~e-h.
Crecutive Deputy Comminsioner of Motor
Class E
regex python-2.7 tesseract string-parsing pytesseract
add a comment |
I have written a program that ask for the user to upload the DMV license picture and reads the DMV License details from the uploaded picture using tesseract ocr. I've got the tesseract part work which does well to some extent. I have got a raw string and now I need to parse that string to fetch user details. The problem is that some fields on DMV license have no lables. Like name, address etc. I need to fetch these details. I can't come up with idea (may be I can use regex but don't know how to get it work?). If someone already did that I'd love to take a look. . Any suggestion would be welcome.
Image:
CODE
Here is the code to read the uploaded file and get text from image using tesseract.
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
# replace the path with the path to tesseract installation directory on server.
pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"
@csrf_exempt
def upload_dmv(request):
if request.method == "POST":
dmv = request.FILES['dmv']
extracted_data = pytesseract.image_to_string(Image.open(dmv))
print(extracted_data)
return HttpResponse(b'OK')
Output
W YORK STALE
DRIVER Ee C{ENeSae
876 071652
BOGADO
PETER,GIOVANNI.
9520 93RD ST FL 2
OZONE PARK, NY 114116
SexM_ Height6'-02" Eyes BRO:
00806/06/1992
Expires 06/06/2018
ENONE
RB
Issued 03/09/2017
Usa
~e-h.
Crecutive Deputy Comminsioner of Motor
Class E
regex python-2.7 tesseract string-parsing pytesseract
add a comment |
I have written a program that ask for the user to upload the DMV license picture and reads the DMV License details from the uploaded picture using tesseract ocr. I've got the tesseract part work which does well to some extent. I have got a raw string and now I need to parse that string to fetch user details. The problem is that some fields on DMV license have no lables. Like name, address etc. I need to fetch these details. I can't come up with idea (may be I can use regex but don't know how to get it work?). If someone already did that I'd love to take a look. . Any suggestion would be welcome.
Image:
CODE
Here is the code to read the uploaded file and get text from image using tesseract.
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
# replace the path with the path to tesseract installation directory on server.
pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"
@csrf_exempt
def upload_dmv(request):
if request.method == "POST":
dmv = request.FILES['dmv']
extracted_data = pytesseract.image_to_string(Image.open(dmv))
print(extracted_data)
return HttpResponse(b'OK')
Output
W YORK STALE
DRIVER Ee C{ENeSae
876 071652
BOGADO
PETER,GIOVANNI.
9520 93RD ST FL 2
OZONE PARK, NY 114116
SexM_ Height6'-02" Eyes BRO:
00806/06/1992
Expires 06/06/2018
ENONE
RB
Issued 03/09/2017
Usa
~e-h.
Crecutive Deputy Comminsioner of Motor
Class E
regex python-2.7 tesseract string-parsing pytesseract
I have written a program that ask for the user to upload the DMV license picture and reads the DMV License details from the uploaded picture using tesseract ocr. I've got the tesseract part work which does well to some extent. I have got a raw string and now I need to parse that string to fetch user details. The problem is that some fields on DMV license have no lables. Like name, address etc. I need to fetch these details. I can't come up with idea (may be I can use regex but don't know how to get it work?). If someone already did that I'd love to take a look. . Any suggestion would be welcome.
Image:
CODE
Here is the code to read the uploaded file and get text from image using tesseract.
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
# replace the path with the path to tesseract installation directory on server.
pytesseract.pytesseract.tesseract_cmd = "C:Program Files (x86)Tesseract-OCR\tesseract.exe"
@csrf_exempt
def upload_dmv(request):
if request.method == "POST":
dmv = request.FILES['dmv']
extracted_data = pytesseract.image_to_string(Image.open(dmv))
print(extracted_data)
return HttpResponse(b'OK')
Output
W YORK STALE
DRIVER Ee C{ENeSae
876 071652
BOGADO
PETER,GIOVANNI.
9520 93RD ST FL 2
OZONE PARK, NY 114116
SexM_ Height6'-02" Eyes BRO:
00806/06/1992
Expires 06/06/2018
ENONE
RB
Issued 03/09/2017
Usa
~e-h.
Crecutive Deputy Comminsioner of Motor
Class E
regex python-2.7 tesseract string-parsing pytesseract
regex python-2.7 tesseract string-parsing pytesseract
edited Nov 22 '18 at 7:55
root
asked Nov 22 '18 at 7:03
rootroot
24413
24413
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53425486%2fhow-to-get-specific-fields-data-from-string-without-field-labels-fetched-from-im%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53425486%2fhow-to-get-specific-fields-data-from-string-without-field-labels-fetched-from-im%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown