Research Journal of Applied Sciences

Year: 2021
Volume: 16
Issue: 2
Page No. 65 - 74

An Inclusive Survey for Text Dependent Automatic Speech Segmentation Techniques

Authors : Ihsan Al-Hassani, Oumayma Al-Dakkak and Abdlnaser Assami

References

Adell, J. and A. Bonafonte, 2004. Towards Phone Segmentation for Concatenative Speech Synthesis. Proceedings of the 5th ISCA Workshop on Speech Synthesis, June 14-16, 2004, Institute of Singapore Chartered Accountants, Pittsburgh, Pennsylvania, pp: 139-144.

Adell, J., A. Bonafonte, J.A. Gomez and M.J. Castro, 2005. Comparative study of automatic phone segmentation methods for TTS. Proceedings of the IEEE International Conference Acoustic Speech Signal Processing (ICASSP’05), March 23-23, 2005, IEEE, Philadelphia, USA., pp: I/309-I/312.

Akdemir, E. and T. Ciloglu, 2010. HMM topology for boundary refinement in automatic speech segmentation. Electron. Let., 46: 1086-1087.
CrossRef  |  Direct Link  |  

Brognaux, S. and T. Drugman, 2015. HMM-based speech segmentation: Improvements of fully automatic approaches. IEEE/ACM. Trans. Audio Speech Lang. Process., 24: 5-15.
CrossRef  |  Direct Link  |  

Brugnara, F., D. Falavigna and M. Omologo, 1993. Automatic segmentation and labeling of speech based on hidden Markov models. Speech Commun., 12: 357-370.
CrossRef  |  

Chappell, D.T. and J.H. Hansen, 2002. A comparison of spectral smoothing methods for segment concatenation based speech synthesis. Speech Commun., 36: 343-373.
CrossRef  |  Direct Link  |  

Chen, L., X. Mao and H. Yan, 2016. Text-independent phoneme segmentation combining egg and speech data. IEEE/ACM. Trans. Audio Speech Langu. Process., 24: 1029-1037.
CrossRef  |  Direct Link  |  

Esposito, A. and G. Aversano, 2004. Text Independent Methods for Speech Segmentation. In: Nonlinear Speech Modeling and Applications, Chollet, G., A. Esposito, M. Faundez-Zanuy and M. Marinaro (Eds.)., Springer, Berlin, Germany, pp: 261-290.

Frihia, H. and H. Bahi, 2017. HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications. Int. J. Speech Technol., 20: 563-573.
CrossRef  |  Direct Link  |  

Glass, J.R., 2003. A probabilistic framework for segment-based speech recognition. Comput. Speech Lang., 17: 137-152.
CrossRef  |  Direct Link  |  

Hemert, J.P.V., 1991. Automatic segmentation of speech. IEEE. Trans. Signal Process., 39: 1008-1012.
CrossRef  |  Direct Link  |  

Hosom, J.P., 2009. Speaker-independent phoneme alignment using transition-dependent states. Speech Commun., 51: 352-368.
CrossRef  |  Direct Link  |  

Jarifi, S., D. Pastor and O. Rosec, 2008. A fusion approach for automatic speech segmentation of large corpora with application to speech synthesis. Speech Commun., 50: 67-80.
CrossRef  |  Direct Link  |  

Keshet, J., S. Shalev-Shwartz, Y. Singer and D. Chazan, 2007. A large margin algorithm for speech-to-phoneme and music-to-score alignment. IEEE. Trans. Audio Speech Lang. Process., 15: 2373-2382.
CrossRef  |  Direct Link  |  

Khanagha, V., K. Daoudi, O. Pont and H. Yahia, 2014. Phonetic segmentation of speech signal using local singularity analysis. Digital Signal Process., 35: 86-94.
CrossRef  |  Direct Link  |  

Kim, S., S. Yun and C.D. Yoo, 2011. Large margin discriminative semi-Markov model for phonetic recognition. IEEE. Trans. Audio Speech Lang. Process., 19: 1999-2012.
CrossRef  |  Direct Link  |  

Kreuk, F., J. Keshet and Y. Adi, 2020. Self-supervised contrastive learning for unsupervised phoneme segmentation. Proc. Interspeech, 1: 3700-3704.

Kreuk, F., Y. Sheena, J. Keshet and Y. Adi, 2020. Phoneme boundary detection using learnable segmental features. Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 4-8, 2020, IEEE, Barcelona, Spain, pp: 8089-8093.

Lee, K.S., 2006. MLP-based phone boundary refining for a TTS database. IEEE. Trans. Audio Speech Lang. Process., 14: 981-989.
CrossRef  |  Direct Link  |  

Lin, C.Y. and J.S.R. Jang, 2007. Automatic phonetic segmentation by score predictive model for the corpora of mandarin singing voices. IEEE. Trans. Audio Speech Lang. Process., 15: 2151-2159.
CrossRef  |  Direct Link  |  

Lin, C.Y., K.T. Chen and J.S.R. Jang, 2005. A hybrid approach to automatic segmentation and labeling for Mandarin Chinese speech corpus. Proceedings of the 9th European Conference on Speech Communication and Technology, September 4-8, 2005, Interspeech, Lisbon, Portugal, pp: 1553-1556.

Ljolje, A., J. Hirschberg and J.P.H.V. Santen, 1997. Automatic Speech Segmentation for Concatenative Inventory Selection. In: Progress in Speech Synthesis, Santen, J.P.H.V., J.P. Olive, R.W. Sproat and J. Hirschberg (Eds.)., Springer, Berlin, Germany, pp: 305-311.

Lo, H.Y. and H.M. Wang, 2007. Phonetic boundary refinement using support vector machine. Proceedings of the 2007 IEEE International Conference on Acoustics Speech and Signal Processing, April 20-15, 2007, IEEE, Honolulu, USA., pp: 933-936.

Malfere, F., O. Deroo, T. Dutiot and C. Ris, 2003. Phonetic alignment: Speech synthesis-based vs. Viterbi-based. Speech Commun., 40: 503-515.
CrossRef  |  

Matousek, J., D. Tihelka and J. Psutka, 2003. Automatic segmentation for Czech concatenative speech synthesis using statistical approach with boundary-specific correction. Proceedings of 8th European Conference on Speech Communication and Technology, September 1-4, 2003, Eurospeech, Geneva, Switzerland, pp: 301-304.

Mporas, I., T. Ganchev and N. Fakotakis, 2008. Phonetic segmentation using multiple speech features. Int. J. Speech Technol., 11: 73-85.
CrossRef  |  Direct Link  |  

Mporas, I., T. Ganchev and N. Fakotakis, 2010. Speech segmentation using regression fusion of boundary predictions. Comput. Speech Lang., 24: 273-288.
CrossRef  |  Direct Link  |  

Park, S.S. and N.S. Kim, 2006. Automatic speech segmentation based on boundary-type candidate selection. IEEE. Signal Process. Lett., 13: 640-643.
CrossRef  |  Direct Link  |  

Park, S.S. and N.S. Kim, 2007. On using multiple models for automatic speech segmentation. IEEE. Trans. Audio Speech Lang. Process., 15: 2202-2212.
CrossRef  |  Direct Link  |  

Pellom, B.L. and J.H. Hansen, 1998. Automatic segmentation of speech recorded in unknown noisy channel characteristics. Speech Commun., 25: 97-116.
CrossRef  |  

Qiao, Y. and N. Minematsu, 2008. Metric learning for unsupervised phoneme segmentation. Proceedings of the 9th Annual Conference of the International Speech Communication Association, September 22-26, 2008, Interspeech, Tokyo, Japan, pp: 1060-1063.

Qiao, Y., D. Luo and N. Minematsu, 2013. Unsupervised optimal phoneme segmentation: Theory and experimental evaluation. IET Signal Process., 7: 577-586.
CrossRef  |  Direct Link  |  

Sahu, P.K., A. Biswas, A. Bhowmick and M. Chandra, 2014. Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition. Eng. Sci. Technol. Int. J., 17: 145-151.
CrossRef  |  Direct Link  |  

Stolcke, A., N. Ryant, V. Mitra, J. Yuan, W. Wang and M. Liberman, 2014. Highly accurate phonetic segmentation using boundary correction models and system fusion. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 4-9, 2014, IEEE, Florence, Italy, pp: 5552-5556.

Toledano, D.T., L.A.H. Gomez and L.V. Grande, 2003. Automatic phonetic segmentation. IEEE. Trans. Speech Audio Process., 11: 617-625.
CrossRef  |  Direct Link  |  

Wang, H., T. Lee, C.C. Leung, B. Ma and H. Li, 2015. Acoustic segment modeling with spectral clustering methods. IEEE/ACM. Trans. Audio Speech Lang. Process., 23`: 264-277.
CrossRef  |  Direct Link  |  

Yuan, J., N. Ryant, M. Liberman, A. Stolcke, V. Mitra and W. Wang, 2013. Automatic phonetic segmentation using boundary models. Interspeech, 1: 2306-2310.
Direct Link  |  

Zhao, S., Y. Soon, S.N. Koh and K.K. Luke, 2015. A hybrid refinement scheme for intra-and cross-corpora phonetic segmentation. Comput. Speech Lang., 29: 81-97.
CrossRef  |  Direct Link  |  

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved