Extracting Music Streams from Printouts

I have gone back to work on the Eric Sunderland archive. I also sent in a poster abstract to DMRN + 20 (Digital Music Research Network) with initial comments. It will become part of a talk to be given next year.

A focus for this trip was to take some more photos of the printouts with music and other data. I will be OCRing (Optical Character Recognition) these using Python’s bindings to openCV and tesseract on the command line. Initial results have been a bit interesting, but more practice might help.

I did use the OCR option on a Samsung phone as a test on a couple of pure text images. The results are interesting. I do need to check the other pipeline and see how the data is processed. The mixed images on computing paper seem to be causing some issues, so the Python pipeline might be more useful here. I suspect that this was not the intended use for the tool, but I wanted to explore the option.

The aim is to be able to process the data computationally using Natural Language Parsing to augment the OCRed data.

No Comments

Leave a Reply

Your email is never shared.Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.