[vlc-devel] Phonetic playlist search filter in VLC

Dhiraj Lohiya lohiya.dhiraj at gmail.com
Sat Mar 27 07:21:14 CET 2010


Hi

I am working upon the *search filter of vlc* and phonetically improvise upon
it's search results and looking to take it forward as a* GSoC project*.

*Why this feature is important for VLC?*
The present search feature in VLC  does a strict string match to provide
results. But the meta data that we are searching for might have a
phonetically different spelling than the input query of the user since it is
based on how users pronounce and spell that query in English character set
and not the *standard spelling*. Moreover, languages other than English face
spelling standardization issues and different people spell the same words in
different ways (which are same phonetically). Moreover, quite a few words be
it in song meta data or otherwise are named entities which can't find a
match even after a look up in the dictionary.

So it is necessary to improvise upon the user experience on this issue by
implementing a phonetic search feature which would search for phonetic
matching words on the fly (not from dictionary or somewhere which would also
be memory and resource expensive).
*
*
*
*
*How I plan to proceed?*
I plan to customize the soundex algorithm for all languages where each
language could have a different phonetic equivalent class of rules
(Generally around 20 rules for most languages).  I would keep the approach
layered so that support for multiple languages could be easily contributed
and more languages could be added by others.

Moreover, since it is important that once a base set of rules are defined by
someone, the rules could themselves be added/evolve based on the user input
and usage.
For instance, if many users(above a threshold set by us) insert some search
string for which no wanted search result is retrieved, we could track what
he finally selects and then accordingly append/modify our set of phonetic
rules based on the phonetic mismatch amongst the  query inserted and result
wanted according to our set of rules. Using this, the* rule sets it could
evolve itself when we collect usage statistics from users based on their
experience. *This feature would add a new dimension to the search
functionality and would surely stand out.

Initially I plan to code this for few Indian languages like Hindi, Marathi
etc. and define a simple way (probably a gui on concept based on
GoogleImageLabeler <http://images.google.com/imagelabeler/>) in which rules
for different languages can be directly added and then people knowing those
languages could contribute.

*Samples:*

   - Some cases with English songs,
      - I want to search search "Nelly Furtado" but spell it as "Neilly
      Furtado". Similarly for "Christina" -> "Cristina" => No results :(
      - Similarly for track name: "Cemeteries Of London" -> "Cemetries Of
      London" or "Lonely" -> "Lonly"
   - In English at least, we can expect/assume that people insert the right
   spellings  for discionary words atleast. But the situation worsens with
   other languages.
   - In case of Hindi songs, if I search for a song which has word "pyar" or
   "naiyya" but I spell the word as "pyaar" or ''nayya", presently no result
   would be returned since this is not in the playlist.

Moreover, now that it is being planned to include the SQLite
Module<http://mailman.videolan.org/pipermail/vlc-devel/2009-August/065079.html>,
the index of metadata from songs in the library will be readily accessible
to capitalize upon this.
*Salient points:* This won't be memory extensive as well since the words are
not retrieved from memory but are generated on the fly based on the given
set of rules which will be fixed for a particular instance.
*What I have done till now:*
As of now, I am having some fun working around with VLC's codebase. I had
coded the algorithm part of the search filter in Java as a part our college
technical festival. I would re-code it in C for VLC considering more
optimizations wherever possible along with provision of an intuitive
interface for contribution of other language rules.

*Attached is the image of some changes I have done to the **vlc** interface
**gui** for including this functionality. When the "Enable phonetic search"
button is clicked, I plan to  enable the phonetic search. I could provide
the patch if someone wants but right now the search functionality hasn't yet
been modified for the original implementation. Just tweaked the search a bit
for testing sake.*

Please shoot your suggestions/opinions/concerns on what you feel about this
feature.

-- 
Regards
Dhiraj Lohiya
3rd year, B.E.(Hons.) Computer Science
BITS Pilani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20100327/9381aa0c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vlcPhoneticSearch.png
Type: image/png
Size: 269656 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20100327/9381aa0c/attachment.png>


More information about the vlc-devel mailing list