APPARATUS AND METHOD FOR SYNCHRONIZING A SECONDARY AUDIO TRACK TO THE AUDIO TRACK OF A VIDEO SOURCE专利检索-计算幽默人工智能专利检索查询-专利查询网

APPARATUS AND METHOD FOR SYNCHRONIZING A SECONDARY AUDIO TRACK TO THE AUDIO TRACK OF A VIDEO SOURCE

阅读：32发布：2020-05-21

专利汇可以提供APPARATUS AND METHOD FOR SYNCHRONIZING A SECONDARY AUDIO TRACK TO THE AUDIO TRACK OF A VIDEO SOURCE专利检索，专利查询，专利分析的服务。并且Synchronizes a secondary audio track to a video. Analyzes at least one track of a video using audio frequency analysis or spectrograms, image analysis or text analysis to find distinct audio/image/caption events from which to ensure synchronization of a secondary audio track. For example, commentary that mocks a character may be played immediately after a particular noise in the audio track of a video occurs such as a door slam. Keeping the secondary audio track in synch with the audio track of a video is performed by periodically searching for distinct events in a track of a video and adjusting the timing of the secondary audio track. May utilize a sound card on a computer to both analyze a DVD sound track and play and adjust timing of the secondary audio track to maintain synchronization. Secondary audio tracks may be purchased and/or downloaded and utilized to add humorous external commentary to a DVD for example.，下面是APPARATUS AND METHOD FOR SYNCHRONIZING A SECONDARY AUDIO TRACK TO THE AUDIO TRACK OF A VIDEO SOURCE专利的具体信息内容。

权利要求

What is claimed is:1. A secondary audio track synchronization apparatus for synchronizing a secondary audio track to an audio track of a video source comprising:a detection module;a timing module;a first event time of an event detected via said detection module wherein said first event occurs in a track associated with a video;a desired audio event time for said event;said timing module configured to alter a timing of a secondary audio track based on a difference between said first event time and said desired audio event time wherein said timing of said secondary audio track is adjusted to remain in synchronization with said audio track of said video.2. The secondary audio track synchronization apparatus of claim 1 wherein said event is detected through frequency analysis of an audio track of said video or via image analysis of a video track of said video or via image or text analysis of a closed/open caption track of said video.3. The secondary audio track synchronization apparatus of claim 1 wherein said video is a DVD.4. The secondary audio track synchronization apparatus of claim 1 wherein said video is a high definition DVD.5. The secondary audio track synchronization apparatus of claim 1 wherein said secondary audio track is an MP3.6. The secondary audio track synchronization apparatus of claim 1 further comprising:an event list comprising at least one event time offset and at least one audio event parameter.7. The secondary audio track synchronization apparatus of claim 1 further comprising an audio card utilized to play said audio track of said video and said secondary audio track simultaneously.8. A secondary audio track synchronization method for synchronizing a secondary audio track to an audio track of a video source comprising:detecting a first event time for an event in a track of a video;obtaining a desired event time for said event;altering a timing of a secondary audio track based on a difference between said first event time and said desired event time wherein said timing of said secondary audio track is adjusted to remain in synchronization with said audio track of said video.9. The secondary audio track synchronization method of claim 8 wherein said detecting said audio event occurs through frequency analysis of said audio track of said video or via image analysis of a video track of said video or via image or text analysis of a closed/open caption track of said video.10. The secondary audio track synchronization method of claim 8 wherein said detecting occurs using an audio track of a video from a DVD.11. The secondary audio track synchronization method of claim 8 wherein said detecting occurs using an audio track of a video which is playing from a high definition DVD.12. The secondary audio track synchronization method of claim 8 wherein said altering said secondary audio track occurs using an MP3.13. The secondary audio track synchronization method of claim 8 further comprising:utilizing an event list comprising at least one event time offset and at least one audio event parameter.14. The secondary audio track synchronization method of claim 8 further comprising utilizing an audio card to play said audio track of said video and said secondary audio track simultaneously.15. A secondary audio track synchronization apparatus for synchronizing a secondary audio track to an audio track of a video source comprising:detecting a first indirect event time for an indirect event in a track of a video;obtaining a desired event time for said indirect event;altering a timing of a secondary audio track based on a difference between said first indirect event time and said desired event time wherein said timing of said secondary audio track is adjusted to remain in synchronization with said audio track of said video.16. The secondary audio track synchronization apparatus of claim 15 wherein said detecting said indirect event occurs through frequency analysis of said audio track of said video.17. The secondary audio track synchronization apparatus of claim 15 wherein said detecting occurs using an audio track of a video from a DVD or high definition DVD.18. The secondary audio track synchronization apparatus of claim 15 wherein said altering said secondary audio track occurs using an MP3.19. The secondary audio track synchronization apparatus of claim 15 further comprising:utilizing an indirect event list comprising at least one indirect event time and a description of said indirect event.20. The secondary audio track synchronization apparatus of claim 15 further comprising utilizing an audio card to play said audio track of said video and said secondary audio track simultaneously.

说明书全文

This application is a continuation in part of U.S. Utility patent application Ser. No. 11/684,460, filed 9 Mar. 2007, the specification of which is hereby incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

Embodiments of the invention described herein pertain to the field of audio/video synchronization systems. More particularly, but not by way of limitation, one or more embodiments of the invention enable an apparatus and method for synchronizing a secondary audio track to the audio track of a video source for example.

2. Description of the Related Art

There is no known apparatus or method for automatically synchronizing a secondary audio track to an audio track of a video source. There are various ways to manually perform synchronization between two audio streams that involve synching the two audio sources based on time (which may be running at a slightly different rate in each source), frame count or I frames in the case of MPEG. However, there is often drift of synch between the two sources. This is particularly evident in the case of DVD players which vary slightly in speed and other factors inherent in the multitude of player models as well as the form of compression and parameters of the DVD or other source. Indeed a secondary source might include various versions that were created using different compression codecs each with slightly different timing.

There are at least two ways to utilize a secondary audio track with a video source such as a DVD. First, the secondary audio track can be played separately from the DVD (for example a rented DVD) and adjusted manually while playing the secondary audio track, for example on an MP3 player coupled with speakers. This requires adjusting the playback of the secondary audio track to keep the secondary audio track in synchronization with the DVD that is playing. If the DVD is paused, the secondary audio track must be paused at the same time and both sources must be started again at the same time when resuming play. Synchronization, may be slightly off when resuming play, so the secondary audio track timing must be adjusted again to ensure synchronization. Slight synchronization errors cause out of synch timings of the secondary audio track versus primary audio track that may fail to provide the intended commentary/humour and may frustrate the user attempting to synchronize the two audio signals.

The second manner in which to utilize a secondary audio track with a video source requires combining the secondary audio track with the audio track of the video source to form a single combined audio track. The current process for combining a secondary audio track with a video source such as a DVD is an extremely technical manual process. The process requires several software tools to perform the required steps. For example, one scenario begins when a DVD is purchased by a user. The user decides to add humorous commentary to the DVD. The commentary is obtained from “RiffTrax.com” a company that specializes in secondary audio track generation and features commentary tracks from the original writers of “Mystery Science Theatre 3000”. The DVD is “ripped” with “DVD Decrypter” or “rejig”. The audio from the DVD is adjusted with “delaycut”. The DVD Audio files are converted to WAV files with “PX3Convert”. The WAV files are manually synched using “Audacity” with a secondary audio track, i.e., the “Riff Track”. The resulting WAV file is converted with “ffmpegGUI” back to DVD format audio (i.e., AC3). The DVD format audio is added to the DVD video and converted to a single file with “Ifoedit” or “rejig”. The single file is then burned onto a DVD with “DVDShrink”.

The forementioned steps each break down into a very technical sub-steps. For example, ripping the files using “rejig” requires the following sub-steps. First, a folder is created on the user's desktop where the work will be performed. After creating the folder, the user inserts the DVD into the computer. The “rejig” program is run. The “rejig” setting are set to “IFO Mode” in the “Settings” and “old engine” is selected. The AC3 Delay box is checked along with any desired foreign language or subs. The output directory folder is selected. Next the “ChapterXtractor” is asserted which obtains the chapter times for the DVD. The user is required to edit the chapter times to remove “chapter 1=”, “chapter 2=”, etc., from the front of each line of the output file leaving one number per line. The one number per line represents the time offsets to each chapter in numeric format. The synchronizing step using “Audacity” uses the following sub-steps. Both the secondary audio track and the audio track of the video are loaded into “Audacity”. The secondary audio track is then cut until the start of the movie lines up with the proper starting point of the secondary audio as indicated in a README file supplied with the secondary audio track. The amount of time to cut is approximate and is used a guideline to obtain a good first cut at synchronization. The sound level of the secondary audio track is adjusted to make sure that it is loud enough for simultaneous playback with the audio track of the video. The process of cutting away or adding time to the secondary audio continues throughout the playing of the video and is checked for synchronization every few minutes to ensure synchronization is correct. When synchronization is off, the secondary audio track timing is adjusted either by advancing or delaying the secondary audio track, or by slowing down or speeding up the secondary audio track. Although two steps of the main process have been described in more detail, the other steps not broken into sub-steps likewise have many pitfalls and are “expert friendly” at best.

As discussed, the technical competency required to create a “riffed DVD” is extremely high. Certain users have found that running alternate tools such as “Delaycut” must be utilized even if the ac3 file indicates a delay of “0 msec”. If using the “goldwave” plugin, then fade-in and fade-out time must be allowed for. These steps put the generation process out of reach for normal users. In addition, although tools such as “sharecrow” have planned features that allow for speeding up and slowing down individual sections of audio, the entire process itself is still manual and highly technical. Other users have reported problems with synchronization when their computers do not have adequate memory, hence having a very capable computer is another requirement for performing the process.

Although the technical competency required to create a “riffed DVD” is very high, the paramount problem is maintaining synchronization between the video and the secondary audio track. There are many reasons why the secondary audio track goes out of synchronization with the DVD.

One reason for loss of synchronization has to do with different versions of a particular movie. For example movies sold in certain countries are required to have scenes deleted, for example violent scenes removed. Hence, there are points through the video where the secondary audio track no longer synchs with the video. For example, the PAL version of the movie “The Matrix” sold in the United Kingdom has synching issues at the point where a main character becomes quite violent. Hence depending on where a DVD is sold, different secondary audio synchronization timings must be employed to synchronize with the remaining portion of the video.

Another reason for loss of synchronization has to do with “drift”. Framerate is a main cause of drift related problems. This requires checking the video framerate to ensure no compression is utilized prior to synching and ensuring that the right file types are utilized. For example, if the secondary audio track synchs properly with the video when watching the video on another piece of hardware, then the synch issues are certainly related to one of the steps utilized when reauthoring on the PC. The authoring process is simply too complex with too many variables to allow for trivial synchronization. Another cause of drift has to do with certain DVD players running slightly slower or faster than at a standard rate. Hence no absolute time starting offsets can be utilized, since synchronization drifts while a video plays and must be adjusted throughout the video using the manual steps previously described.

Another reason for loss of synchronization has to do with ambiguous synchronization lines in the movie. For example, in the movie “the Fifth Element”, the sixth synchronization line “You have one point on your license” is spoken twice in the movie, once by a computer voice and once by an actor's voice. This causes confusion among users attempting to add the secondary sound track to the video.

For at least these reasons, there is a need for an apparatus and method for synchronizing a secondary audio track to the audio track of a video source.

BRIEF SUMMARY OF THE INVENTION

Embodiments of the invention enable an apparatus and method for synchronizing a secondary audio track to the audio track of a video source for example. In one or more embodiments the secondary audio track is an MP3 that contains commentary, music or other audio. The video may be movie, news program, television series, advertisement or any other video source. In one or more embodiments, the video may be a DVD (or high definition DVD) and the secondary audio track may include commentary e.g., of a humorous nature. Any other type of audio may be utilized in the secondary audio track, e.g., sound effects, music, etc. Control of the timing of play of the secondary audio track using embodiments of the invention allows for automatic synchronization between the secondary audio track and the audio track of the video.

Embodiments of the invention may utilize audio techniques or indirect techniques such as closed/open caption (which may for example include sub-pictures or any other channels on which subtitles are delivered), or video analysis for synchronization. One or more embodiments analyze the audio track of a video using audio frequency analysis or spectrograms to find distinct audio events from which to ensure synchronization of a secondary audio track. These embodiments or other embodiments may also analyze the closed/open caption images/text (embedded in the video or within a separate channel for example) associated with the video to find distinct images, text strings in images, or text strings from which to ensure synchronization of a secondary audio track. Other embodiments of the invention may utilize video analysis, for example scene detection or any other image processing algorithm to determine where in a movie the current play point is. Yet other embodiments may utilize any combination of audio and indirect events such as closed/open caption or video analysis to find the timing of events whether they be audio based or associated with any other track on the video besides the audio track.

Audio events are not limited to the spoken word and hence voice recognition systems are but one form of audio analyzer that may be utilized with embodiments of the invention. For example, commentary that mocks a character may be played immediately after an audio event, e.g., particular noise in the audio track of a video occurs, such as a door slam. Keeping the secondary audio track in synch with the audio track of the video is performed by periodically searching for distinct audio events in the audio track of a video and adjusting the timing of the secondary audio track.

Indirect events not associated with the audio track such as closed/open caption events may be utilized in synchronizing the secondary audio track. For example, analyzing an image from the closed/open caption stream and performing any algorithm for example that looks up the exact image from a data structure or hash so that the observed time of the closed/open caption image event in the video may gathered is in keeping with the spirit of the invention. The observed event time may be utilized in adjusting the timing of the secondary audio track to match the current play point of the audio track of the video. Alternatively, any text associated with the closed/open caption may likewise be utilized to find the current location in the video where the audio is playing and likewise adjust the secondary audio track.

Likewise, indirect events not associated with the audio track such as image events may be utilized in synchronizing the secondary audio track. For example, any algorithm that may detect a scene change, or a particular percentage of color in a frame, or a face showing up in a frame or an explosion or any other image event may be utilized in one or more embodiments of the invention.

Regardless of whether an audio event or indirect event such as closed/open caption or video event is utilized to determine the current play point of the audio track of the video, the timing may be adjusted by advancing or delaying the play or speeding up or slowing down of the secondary audio track until synchronization is achieved. Alternatively, the secondary audio track may be indexed to allow for event driven playback of portions of the secondary audio track after observing particular audio events. In this scenario, a list of secondary audio tracks or “clips” are simply played at the adjusted synchronization points in time.

Embodiments of the invention may utilize a sound card on a computer to both analyze a DVD sound track and play and adjust timing of the secondary audio track to maintain synchronization. Third party secondary audio tracks may be generated by a user or purchased and/or downloaded for example from “RiffTrax.com” for example and then utilized to add humorous external commentary to a video. Embodiments of the invention allow for bypassing the generation of a “riffed DVD” altogether as the apparatus is capable of synchronizing audio in real-time. Hence use of rented DVDs (or high definition DVDs) without generating a second DVD is thus enabled.

Other embodiments may utilize a microphone for example in external configurations where a computer or MP3 player with a microphone is utilized to play and synchronize the secondary audio track to the audio track of a video. These embodiments for example allow an MP3 player configured with a microphone to be taken into a movie theater with the user of the invention able to hear a secondary audio track (for example commentary/music/humorous or any other type of audio) synchronized to a movie through headphones.

Embodiments of the invention utilize a timing module that alters the timing of the secondary audio track based on detected audio event times detected in the audio track or indirect event times from closed/open captions or video scenes of an associated video for example. The desired event time is compared to the detected audio event time for an audio event and the timing of the secondary audio track is altered based on the time difference to maintain synchronization. The timing may be altered by speeding up or slowing down the secondary audio track to drift the secondary audio track back into synchronization or alternatively or in combination, the secondary audio track may be advanced or delayed to achieve synchronization. The timing module may make use of the hardware previously described and is not limited to spoken word audio events or image/text based closed/open caption events. Any other method of directly determining the point in time where a video is playing associated audio is in keeping with the spirit of the invention.

Embodiments of the method may detect audio or indirect events associated with the audio such as closed/open caption or video/scene events to obtain a detected event time and alter the timing of the secondary audio track (or tracks whether contiguous in time or not) to maintain synchronization. Any combination of audio events and indirect events may also be utilized together to provide more events from which to synchronize the secondary audio track.

In one or more embodiments, the timing module may make use of a timing list that details the desired audio events and time offsets thereof. The list may further include general sonogram parameters that detail the general shape of the sonogram, i.e., frequency range and amplitudes in any format that allows for the internal or external detection of audio events internal to a computer or external via a microphone for example. The list may further include indirect event parameters such as hash keys for closed/open caption images, associated offset(s) into secondary audio track(s) at which to synchronize.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects, features and advantages of the present invention will be more apparent from the following more particular description thereof, presented in conjunction with the following drawings wherein:

FIG. 1 shows a system architecture diagram that includes an internal embodiment of the apparatus.

FIG. 2 shows a system architecture diagram that includes an external embodiment of the apparatus.

FIG. 3 shows a timing diagram for an audio track of a video source and for a secondary audio track showing advance and delay of portions of the secondary audio track to achieve synchronization.

FIG. 4 shows a desired audio event timing list.

FIG. 5 shows a flowchart for an embodiment of the instant method.

DETAILED DESCRIPTION

An apparatus and method for synchronizing a secondary audio track to the audio track of a video source will now be described. In the following exemplary description numerous specific details are set forth in order to provide a more thorough understanding of embodiments of the invention. It will be apparent, however, to an artisan of ordinary skill that the present invention may be practiced without incorporating all aspects of the specific details described herein. In other instances, specific features, quantities, or measurements well known to those of ordinary skill in the art have not been described in detail so as not to obscure the invention. Readers should note that although examples of the invention are set forth herein, the claims, and the full scope of any equivalents, are what define the metes and bounds of the invention.

FIG. 1 shows a system architecture diagram that includes an internal embodiment of the apparatus. In this configuration audio is detected and the secondary audio track is synchronized internally within a computer. Video source 100, in this case a DVD or high definition DVD is played on DVD player 101. DVD player 101 may be integrated with computer 130 or may be an external DVD player that is coupled with computer 130 electronically, wirelessly or optically to transmit audio to computer 130. The video source is not required to be a DVD and may be an electronic download of a movie or other video broadcast for example. The video may be movie, news program, television series, advertisement or any other video source. In other embodiments, the secondary audio track may be mixed or played wirelessly through a stereo for example without being combined within a sound card. Any method of playing the synchronized audio generated by embodiments of the invention is in keeping with the spirit of the invention.

Video source 100, when played yields several tracks. One track is utilized for video that is made up of scenes 110a and 110b for example. Another track includes associated audio track 120, here shown as a sonogram, i.e., a type of spectrogram. Yet another track includes a closed/open caption track having images and or text 115a-c. Closed/open caption track as used herein includes any track associated with a video that includes images or text descriptive of the audio occurring in the video, including but not limited to subtitle, line 21, line 22, world system teletext tracks. Any of these types of indirect tracks may be utilized in synchronizing secondary audio with embodiments of the invention.

In one or more embodiments the secondary audio track is an MP3 that contains commentary, music or other audio and may for example include commentary of a humorous nature. Any other type of audio may be utilized in the secondary audio track, e.g., sound effects. For example, the audio events and secondary audio track or any associated clips are not limited to the spoken word.

Audio track 120 of video source 100 is transmitted to (or played on) computer 130 and in the case of audio is directed to sound card 131. Computer 130 may be any type of computer configured to execute program instructions including but not limited to PCs, cell phones and MP3 players. The sound card is sampled by detection module 132 to detect audio events. Audio events that are found are provided to timing module 133 to alter the timing of secondary audio track 140, here also shown as a sonogram.

In another embodiment of the invention, indirect sources not associated with audio track 120 may be analyzed to obtain timing offsets for events. Indirect tracks are transmitted to computer 130 and in the case of image or text data are directed to detection module 132. For example, closed/open caption images or text 115a-c may play at certain times. When these images and/or text having closed/open captions are obtained from DVD player 101 via computer 130, the images may be quickly analyzed by detection module 132 to obtain a unique key for example that provides a quick reference to look up the event, for example counting the number of white versus black pixels, or counting the number of white versus black pixels along the subset of the pixel lines. The caption may be captured into a bitmap and a histogram may be generated for example to generate a key from which to look up an offset. If there are multiple keys with the same value, then the first occurrence may be utilized to correlate offsets, so that the second occurrence can be timed based on the first occurrence for example. This for example, may be faster than decoding the actual text of the caption, however this technique may also be utilized. Any other method of generating a key associated with a particular closed/open caption is in keeping with the spirit of the invention including but not limited to optical character recognition to obtain a text string from the image.

In yet another example of synchronization using an indirect track, video source 100 may be analyzed to determine the scene changes, such as when scene 110a changes to scene 110b, or within a scene using other image processing algorithms to determine when an object appears, disappears or changes for example. An example scene change detection algorithm may be implemented by for example determining when a certain percentage of the pixels in the image change from one frame to the next. A threshold may be utilized for the percentage and modified until scene changes are correctly detected within any range of desired error rate.

Other embodiments of the invention may utilize any combination of direct or indirect events, i.e., within audio track 120, or video track of video source 100, or closed/open caption track to obtain events and perform synchronization.

By altering the timing of play of secondary audio track 140, synchronization is maintained by determining the time difference between the audio event and the desired time that that event should occur. The difference is applied by the timing module to alter the play of secondary audio track 140. Secondary audio track 140 may reside on computer 130 or may be held externally as secondary audio track 140a, for example in MP3 player 150 which is controlled by computer 130 to slow down, speed up, advance or delay secondary audio track 140a. Output of the synchronized combined audio occurs at speaker 160 which may be any type of speaker including self contained speakers or headphones for example. Control of the timing of play of secondary audio track 140 or 140a using embodiments of the invention allows for automatic synchronization between the secondary audio track 140 (or 140a) and audio track 120 of video source 100.

Embodiments of the invention may analyze audio track 120 of a video source 100 using audio frequency analysis or spectrograms to find distinct audio events from which to ensure synchronization of a secondary audio track. Searching for audio events is not limited to one language track, but may utilize one or more or any combination of the language tracks associated with a video to find events, for example for some languages an event may utilize a short audio response while other languages may utilize a longer audio response for a given phrase. Use of any language track then allows for the easiest phrases to be utilized independent of language. Audio events are not limited to the spoken word and hence voice recognition systems are but one form of audio analyzer that may be utilized with embodiments of the invention. For example, commentary that mocks a character may be played immediately after an audio event, e.g., particular noise in the audio track of a video occurs, such as a door slam. Alternatively, an image in the indirect tracks/streams such as a closed/open caption stream may be analyzed to determine when a particular event occurs.

Keeping the secondary audio track in synch with the audio track of the video is performed by periodically searching for distinct events such as audio events in the audio track using detection module 132 and adjusting the timing of the secondary audio track using timing module 133. Detection module 132 may also be configured to analyze images such as from the video track or from the closed/open caption track as well to find event times. The timing may be adjusted by advancing or delaying the play or speeding up or slowing down of the secondary audio track based on the event times as found from the audio/video/caption tracks. Alternatively, the secondary audio track may be indexed to allow for event driven playback of portions of the secondary audio track after observing particular audio events.

Third party secondary audio tracks may be created by a user or purchased and/or downloaded for example from “RiffTrax.com” for example and then utilized to add external commentary or any other type of audio to a video. Embodiments of the invention allow for bypassing the generation of a “riffed DVD” altogether as the apparatus is capable of synchronizing audio in real-time. Hence use of rented DVDs (or high definition DVDs) without generating a second DVD is thus enabled.

FIG. 2 shows a system architecture diagram that includes an external embodiment of the apparatus. This configuration is utilized when an audio link or video link as opposed to an audio link is desired, for example in a theater or in front of a television for example. In this configuration, sound 180 emanates from speaker 160 and is utilized to couple audio track 120 to a computer or MP3 player (or cell phone with sufficient computer processing power) associated with an embodiment of the invention. In this embodiment, microphone 190 is coupled to computing element 130a which may be a general purpose computer or microprocessor in an MP3 player for example. Microphone 190 is utilized to obtain audio track 120 and pass the audio track to detection module 132 and timing module 133 for controlling the timing of secondary audio track 140a and sound module 131a (a type of sound card for example). Alternatively, or in combination imaging device 191 may be utilized to detect scene changes for example via video source having scenes 110a and 110b using any available scene change detection algorithm or other image processing algorithm enabled to detect events in a video. Output may be transmitted to headphones 190 or to a standard speaker for example.

This for example, allows for a user to take an MP3 player or cell phone coupled with a microphone and/or camera to a movie theatre and with earphones, hear a synchronized secondary audio track that greatly enhances a movie and in many cases makes a serious or dramatic movie quite humorous.

FIG. 3 shows a timing diagram for an audio track of a video source and for a secondary audio track showing advance and delay of portions of the secondary audio track to achieve synchronization. Embodiments of the invention utilize a timing module (see FIGS. 1, 2) that alters the timing of secondary audio track (that includes clips 340a and 340b of the track). It will be recognized by one skilled in the art that the secondary audio track may include any number of audio clips formed separately or combined as a whole into one secondary audio track.

Event times associated with events 300 and 301 are detected in either the video track of video source 100 or closed/open caption track having captions 115a-c, or in audio track 120 of an associated video source 100 by the detection module (see FIGS. 1, 2). The desired audio event times 350 and 360 reside at offsets 370 and 371 respectively. The desired audio event times are compared to the detected event times 300 and 301 and the timing of the secondary audio track having clips 340a and 340b is altered based on the time difference to maintain synchronization. The offsets 370 and 371 are compared to the difference between detected event times 300 and 301 scheduled audio event times (when the secondary audio clips would play without altering any timing of the currently playing secondary audio track). The timing may be altered by speeding up or slowing down the secondary audio track to drift the secondary audio track back into synchronization or alternatively or in combination, the secondary audio track may be advanced or delayed to achieve synchronization. In one embodiment clip 340a of secondary audio track is delayed by T1 while clip 340b is advanced by T2 to achieve synchronization. In another embodiment play is slowed to allow clip 340a to occur later at time 350 as shown in the bottom offset version of clip 340a, while play is sped up before to allow the occurrence of clip 340b to occur at time 360. In the case of a deleted scene occurring for example, embodiments of the invention may detect that audio events have jumped forward and hence skip ahead in the secondary audio track to regain synchronization. In general for a given instance of a movie, i.e., a movie for a certain region, the offsets will not jump since there will be no deleted scenes, however when watching the same movie on TV, many great scenes will be deleted, and jumping may occur often in the external embodiments of the invention.

FIG. 4 shows a desired audio event timing list 400. In one or more embodiments, the timing module may make use of a timing list that details the desired audio events and time offsets thereof. The list may further include general sonogram parameters that detail the general shape of the sonogram, i.e., frequency range and amplitudes in any format that allows for the internal or external detection of audio events internal to a computer or external via a microphone for example. Desired audio event 401 may include an event name, here for example “door slam”, with time offset of 10020 and offset to the associated secondary audio clip set to 300. The description of the audio event may be simple or complex so long as the detection module is provided with enough information to selectively detect the audio event. In this simple example, the main frequency range for the event is 200-800 and 1200-1420 with an amplitude of greater than 82. Any units may be utilized with embodiments of the invention. Likewise, audio event 402 includes a shout at time offset 18202 with an offset to the associated audio clip within the secondary audio track of 382. Audio event 403 includes spoken word definition and associated times and offsets. Any number of audio events may be utilized to synchronize a secondary audio track with a video. When a detected audio event occurs before or after it is supposed to the secondary audio track may be shifted (jump forward or back) to resynchronize. Desired video event 404, i.e., an event associated with the video track, here a scene change associated with a value that detection module 132 is configured to generate and the offset from the start of the video about 39 minutes in, and a clip name to play “sc2.mp3”. In this case, the format is slightly different from the audio events 401-3, however any format that associates any type of event with the offset of when the event should occur and the audio to play either directly or indirectly (clips versus speeding up or slowing down a single secondary audio track as 401-403) is in keeping with the spirit of the invention. Likewise, closed/open caption event 405 has a key (or hash) associated with it that detection module will find during the playing of the video along with the offset to where the caption should occur in the video. This allows for the secondary audio track to be advanced or delayed for example. Had a clip been associated with the event the event could alternatively or in combination play with the secondary audio track. Use of XML in representing timing events (whether audio event, video event or close/open caption event related) is in keeping with the spirit of the invention.

FIG. 5 shows a flowchart for an embodiment of the instant method. The process begins at 500. A first event time is detected at 501 for an event in a track of a video. The track may be audio track 120, or may be video track associated with video 100, or close/open caption track associated with captions 115a-c for example. Any method may be utilized to detect the events include frequency analysis of the audio and/or spectrographic analysis or voice recognition software, scene change or caption hashing for example. A desired event time for the detected event is obtained at 502. The timing of a secondary audio track based on a difference between the first event time and the desired event time is altered at 503 with the timing of the secondary audio track adjusted to remain in synchronization with the audio track of the video including the addition of any offsets to secondary audio clip starting times. If there are more audio events to synchronize at determined at 504, then processing proceeds to 501, else processing ends at 505.

While the invention herein disclosed has been described by means of specific embodiments and applications thereof, numerous modifications and variations could be made thereto by those skilled in the art without departing from the scope of the invention set forth in the claims.

标题	发布/更新时间	阅读量
生成和分发具有相关情绪的音乐和故事的播放列表	2020-05-12	27
基于大数据搜索的幽默型机器人对话控制方法和系统	2020-05-13	814
链接关联分析系统和方法	2020-05-14	511
プログラム及びゲームシステムの制御方法	2020-05-19	487
Robot for teaching to move by emergence body	2020-05-20	49
Apparatus and method for synchronizing a secondary audio track to the audio track of a video source	2020-05-19	499
Method and system for selecting documents by measuring document quality	2020-05-23	534
Game apparatus and game result display program	2020-05-31	705
Method for advertising internet web sites	2020-05-30	174
Procédé d'évaluation dynamique de l'humeur d'un utilisateur de messagerie instantanée	2020-05-21	592

APPARATUS AND METHOD FOR SYNCHRONIZING A SECONDARY AUDIO TRACK TO THE AUDIO TRACK OF A VIDEO SOURCE

该功能需要专业版企业版VIP权限，您可以：