IdxSub2Srt v1.3
Convert idx/sub subtitle files to srt
Last update 31/12/2009

5 Euros

Download IdxSub2Srt
Please note that IdxSub2Srt is distributed in
the same package with its "brother" project
, AVIAddXSubs.

Overview

IdxSub2Srt is a free program to convert existing idx/sub files to srt text format. Idx/sub files are generated mostly from DVD rips and represent, actually, the subtitle contents of those ripped DVDs. Idx/sub files contain the subtitles as bitmaps and so to convert to a text format like srt some kind of OCR (Optical Character Recognition) function is needed. This function is provided from IdxSub2Srt in a way that I think makes the whole conversion process a simple one and comfortable, so with no much hassles in about 10 minutes a user is able to convert any subtitle contained in a idx/sub file to its srt equivalent.

The OCR function used is a simple one that uses a kind of pattern matching and the whole effort the user supplies is to make the program learn what text (usually a single letter) represents every pattern found in subtitle bitmaps. After the program has learnt the whole alphabet used and every other symbol (like numbers, etc) then all subtitles can be converted, easily, to text.

IdxSub2Srt makes the whole learning process as comfortable and fast as possible and I think succeeds very much in this aspect. It is able to keep an OCR database so every new idx/sub file analyzed can be checked against this database and if already its patterns are known it leaves the user with the task to recognize only the missing character from his/her previous efforts.

At the moment the program can handle English subtitles and those that match the default character set configured in your Windows PC. For example, if my PC is set (through Control Panel/Regional and Language Options/Advanced) to have Greek as default character set for non-Unicode text, then the program can handle English and Greek subtitle text conversion.

Conversion of idx/sub to srt has many advantages. For example to recreate the idx/sub file but this time with user selection of font, font size and position on screen. This is my case with WDTV (Western Digital TV HD) media player which has a very good support for idx/sub subtitle files. Most of the times the positioning information in original idx/sub is not correct for this media player (not to mention  the quality of font and size) so I convert it to a srt and using AVIAddXSubs (in the same zip package with IdxSub2Srt) I convert it back to idx/sub. But this time with the appropriate positioning (on screen) for WDTV and much better looking letters, bigger in size.

Another useful thing is to help translators to get the original subtitles and translate them to another language.

A srt file is a more versatile format to store your subtitles together with the related videos. They take much less space too.

Program Description

  1. Subtitle language Selection. Select the language to extract from the loaded idx/sub. Every idx/sub file can contain many languages.

  2. Load Idx/Sub. Select the idx file to be processed. Only the selected language from this file will be loaded. See 1.

  3. Save. From time to time save your work. Please note that your work is saved automatically every time you exit the program.

  4. Generate Srt. Generates the recognized text for every subtitle bitmap and saves it in the same directory as the loaded idx/sub, using the name of the idx but with the srt extension.

  5. Previous, Next Subtitle (<<, >>). When an idx/sub file is already loaded you can browse back and forth the subtitles. Case 13 changes this operation a bit. See 13.

  6. Subtitle bitmap. Displays the subtitle bitmap. The same time the selected pattern (to be learned/recognized) appears in red. See 7, 8, 9, 15, 16.

  7. Previous, Next Pattern in currently selected subtitle (<<, >>). When an idx file is loaded then for the currently selected subtitle there is a list with all unique patterns contained. With <<, >> buttons you can browse these patterns and enter in 9 the appropriate text/letter that corresponds to it.

  8. Current Pattern/Text to Display/Learn. The current pattern for the current subtitle appears there in red. The same pattern is in red at 6 to help enter the correct text for it.

  9. Enter Text for currently selected Pattern. In this place (edit box) is entered the text that corresponds to the selected pattern of the selected subtitle.

  10. Use my Edited Text. The recognized text for every subtitle appears in 14 and is generated automatically. The user is able to overwrite this text and enter his/her own modifications that the program will use at srt generation.

  11. Current subtitle/Total subtitles. It displays the currently selected subtitle and the total number of subtitles. When "Only Unknown letters" (13) is checked it displays the current subtitle with unrecognized patterns (always the first) and the total number of subtitles with unrecognized patterns.

  12. AVRG Normal & AVRG Italics. These two options control how the program separates words. "AVRG Normal" is for normal style text and "AVRG Italics" for italic style text. It appears that a dedicated value is needed for those two text styles, with the one for italics been lower.  They work this way: When two patterns have a distance less that AVRG number (in pixels) then are considered as belonging to the same word. If distance is bigger than AVRG number then a space is inserted between them. These values are generated automatically through some statistics but the user can tweak them to get better results, looking the result generated immediately at 14.

  13. Only Unknown letters. When is checked you can browse only subtitles that contain unrecognized patterns (5) and only the unrecognized patterns of the subtitle (7). You cannot go back and you can go forth only if the selected pattern has its text entered first. This function is very important for the OCR learning process.

  14. Generated Subtitle text. The generated text for the current subtitle appears there. Every non recognized pattern appears as # in the text. This text is not modifiable except you check "Use my Edited Text" (10). In this case the user provided text is considered for the generation of the final srt file.

  15. Italic. Marks a pattern as to be in italics. The line of text that contains at least one such letter will be enclosed in <i></i> tags.

  16. All Italics. All patterns of the selected subtitle are marked as italics.

  17. Ignore Subtitle. The subtitle is ignored and is not included in srt generation. This is useful to skip subtitles that are for those having hearing problems, etc.

  18. There is entered the number of a subtitle to jump. The jump is made when the button Go (19) is pressed.

  19. Go. Jumps to the subtitle which its number is entered at 18.

Work Flow

The first thing is to select the language to be extracted from idx/sub. This is done through 1.

Select the idx/sub file through 2. The program will get the selected language and extract the corresponding bitmaps. The bitmaps will be analyzed and all separate patterns on them will be entered in a list. Next the program will check those patterns against any existing OCR database and if one OCR file is found to have at list 10 patterns same as of the idx/sub loaded, then this will be used. The user has now to learn the program any new patterns introduced. The analyzing process of an idx/sub file is done only once. When you save your work by hand (Save button - 3) or automatically every time you exit the program, a .prj file will be created in the same directory as the idx/sub file. This will include all the analysis information and the OCR file used. The next time an idx/sub file is reloaded and its corresponding prj file is present in the same directory, then all needed analysis information will be loaded from there.

The first time an idx/sub file is loaded and analyzed (no prj file present) a screen appears to the user to help the program distinguish text the best possible way on the bitmaps.

Choose the color that gives the most solid and lean characters in the first subtitle in the idx/sub that appears in the back at the main screen at 6. The program suggest the best color it thinks but maybe you can give a better selection. Generally if the suggested color gives letters solid and lean, keep it (letter's inner/body color). Avoid colors that represent the outline of the letters.

Please have in mind that OCR learning is not stored in these prj files (one for every loaded idx/sub file). Your work is saved in the OCR database. The OCR database is a directory, named OCR, created in same directory from where the IdxSub2Srt runs. It contains pairs of OCR*.txt/OCR*.bin files that really contain your work. However prj files store some other information, like the text you enter when selecting "Use my edited text" (10). They, also, store which subtitles have to be ignored in srt generation (17). Except those information all other analysis data can be recovered if this file is deleted. The program will load the appropriate OCR file and eventually a new prj file will be created. Please note that if you delete the OCR database for any reason, all prj files have to be deleted too.

Now the real OCR learning starts. For every subtitle you browse through 5 there is a number of patterns extracted through the analysis phase. Your work is to replace the # symbol assigned automatically, which means "not known pattern", to something else that really corresponds as text to the selected pattern. This pattern can be many times in the same subtitle and of course in many other subtitles. For example in the picture above a pattern with the Greek letter o (omicron) is selected. This appears in red at 8 and is painted red in every part of the subtitle bitmap at 6, is found. Letter o is found in six places in the shown subtitle bitmap.

Every time the appropriate text is entered at 9, text is generated for the subtitle and appears at 14. Progressively all # will be replaced with the user entered text.

To make the work faster check "Only Unknown letters" option (13). This helps to concentrate your efforts only to subtitles and patterns not recognized yet. Checking this option you can browse only forward and only if you enter text (recognize) the current pattern. At 11 you can watch, every time you go to the next not recognized pattern, the number of subtitles that remain not completely recognized. If you do a mistake and you wish to go back to fix the text entered for a pattern in the current subtitle, just uncheck 13, browse to the pattern, fix, and check again 13 to continue your work.

One aspect you must have at your attention is how the program inserts spaces to organize the text in words. It uses the distance between the patterns and two numbers ("AVRG Normal" & "AVRG Italics" - see 12). One number affects Normal style text, the other italic style text. When the distance between any two consecutive patterns is less than its AVRG number then they are considered as belonging in the same word. If the distance is bigger than this number then a space is inserted between them. Those two numbers are computed through some statistics but the user can tweak and see by himself/herself the result (at 14) and decide which value gives the best "word separation" results.

When all patterns are recognized then you can press "Generate Srt" (4) to generate the srt file. This will be created in the same directory as the loaded idx/sub file.

What is New?

1.3
  • A fix was made for better support of RTL languages (Arabic, Farsi, Hebrew). Please note that the program was created with Greek and Latin alphabet in mind. Languages like Arabic and Farsi that use to connect the letters one with the other will not be handled the best way from IdxSub2Srt. That means more work for Arabic and Farsi users.
1.2
  • "AVRG Space" option  is broken in two, "AVRG Normal" and "AVRG Italics". Those two text styles need a dedicated AVRG option with "AVRG Italics" been lower. This way the user, tweaking these numbers, can get the best result, in word separation, for both text styles.
1.1
  • Added "Ignore subtitle". Marks the currently selected subtitle to exclude from srt generation.
  • Added ability to jump to any of the subtitles. You enter the number and you press "Go".

For comments or questions use the form below.
The email is needed only if you wish a reply.

  • Greek speaking people can write to me in Greek. Please avoid greeklish.

To avoid spam, please enter (at AntiSpam) the third(3) word from the following list: fox, dog, cat, mouse, rabbit, bird, tiger

email
Subject
AntiSpam