基于持续元音多波段倒谱特征的儿童声带小结检测

Detection of vocal fold nodules in children based on multi-band cepstral features of sustained vowels

雷简菡;刘阳;刘伯权;刘恒鑫

1:上海交通大学人文学院

2:中国人民解放军总医院第六医学中心耳鼻咽喉头颈外科医学部/国家耳鼻咽喉疾病临床医学研究中心

3:国家儿童医学中心/首都医科大学附属北京儿童医院耳鼻咽喉头颈外科

摘要
目的 提出一种检测儿童声带小结的有效的嗓音声学客观评估方法。方法 对48例儿童声带小结患者及40例嗓音正常儿童的持续元音/a/进行多波段倒谱分析,提取各波段的13个梅尔频率倒谱系数(Mel-frequency cepstral coefficients, MFCC)(MFCC1~MFCC13)、5个倒谱峰值[即第一个和第二个倒谱峰值之间的幅度差(difference in amplitude of peaks, DAP)、循环频率差(difference in quefrency of peaks, DQP)、峰值能量(energy of peak, EP)EP1和EP2、峰值之间的倒谱能量(energy between cepstral peaks, EEP)]及6种倒谱距离(D_1~D_6)作为子带特征,通过独立样本t检验对两组声学特征参数进行差异比较,并将具有统计学意义的指标进行受试者工作特征(receiver operating characteristic, ROC)曲线分析。结果 声带小结儿童组中的MFCC2、MFCC3、MFCC5、MFCC11、MFCC12、DQP、EP1、EP2特征值均显著高于正常儿童组(P<0.05或P<0.001),MFCC1、MFCC6、MFCC8、MFCC13、EEP均显著低于正常儿童组(P<0.05)。这些特征的ROC曲线分析显示,MFCC1、MFCC2、MFCC3、MFCC5、MFCC6、MFCC8、MFCC11、MFCC12、MFCC13、DQP、EP1、EP2、EEP特征联合检测的ROC曲线下面积为0.98,MFCC1、MFCC2、MFCC3、MFCC5、MFCC6、MFCC8、MFCC11、MFCC12、DQP、EP2参数单独的ROC曲线下面积均大于0.7,均有一定的准确性。其中,MFCC2与MFCC3的ROC曲线下面积分别为0.85、0.87,表明它们对声带小结儿童的嗓音片段具有较高的诊断价值。结论 基于持续元音多波段倒谱特征的特定声学参数组合,包括梅尔频率倒谱系数(MFCC1、MFCC2、MFCC3、MFCC5、MFCC6、MFCC8、MFCC11、MFCC12、MFCC13)和倒谱峰值(DQP、EP1、EP2、EEP),展现出较高的敏感性和特异性,尤其是MFCC2和MFCC3,在儿童声带小结相关嗓音障碍的检测中表现出优异的诊断能力。
关键词
声带小结;儿童语音障碍;声学特征;梅尔频率倒谱系数
基金项目(Foundation):
作者
雷简菡;刘阳;刘伯权;刘恒鑫
参考文献

[1]刘恒鑫,王华,郝建萍,等.儿童嗓音障碍行为学干预的研究进展[J].听力学及言语疾病杂志,2024,32(2):17 6-181.

[2]CASTELLANA A,CARULLO A,CORBELLINI S,et al.Discriminating pathological voice from healthy voice using cepstral peak prominence smoothed distribution in sustained vowel[J].IEEE Trans Instrum Meas,2018,67(3):646-654.

[3]DELGADO-VARGAS B,ACLE-CERVERA L,SANZLOPEZ L,et al.Cepstral analysis in patients with a vocal fold motility impairment:advantages of the cepstrum over time-based acoustic analysis[J].Eur Arch Otorhinolaryngol,2021,278(1):173-179.

[4]高少华,卢红云,韩立文,等.Dr.Speech嗓音分析软件测量嗓音障碍严重程度指数及其验证[J].听力学及言语疾病杂志,2021,29(4):388-392.

[5]RODRIGUES P M,FREITAS D,TEIXEIRA J P, et al.Electroencephalogram hybrid method for Alzheimer early detection[J].Procedia Comput Sci, 2018,138:209-214.

[6]DING Y,SUN Y,LI Y,et al.Selection of OSA-specific pronunciations and assessment of disease severity assisted by machine learning[J].J Clin Sleep Med,2022,18(11):2663-2672.

[7]BROCKMANN-BAUSER M,VAN STAN J H,SAMPAIO M C,et al.Effects of vocal intensity and fundamental frequency on cepstral peak prominence in patients with voice disorders and vocally healthy controls[J].J Voice,2021,35(3):411-417.

[8]BENBA A,JILBAB A,HAMMOUCH A.Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson's disease and healthy people[J].Int J Speech Technol,2016,19:449-456.

[9]TAGUCHI T,TACHIKAWA H,NEMOTO K, et al. Major depressive disorder discrimination using vocal acoustic features[J].J Affect Disord,2018,225(3):214-220.

[10]FANG S H,TSAO Y,HSIAO M J,et al.Detection of pathological voice using cepstrum vectors:a deep learning approach[J].J Voice, 2019,33(5):634-641.

[11] CORDEIRO H,FONSECA J,GUIMARAES I,et al.Hierarchical classification and system combination for automatically identifying physiological and neuromuscular laryngeal pathologies[J].J Voice,2017,31(3):384.e9-384.e14.

[12]JIANG W,ZHENG X,XUE Q.Influence of vocal fold cover layer thickness on its vibratory dynamics during voice production[J].The Journal of the Acoustical Society of America,2019,146(1):369-380.

[13]YANG F,KONG D,WANG Y,et al. Analysis of 1782pediatric hoarseness cases:a clinical retrospect study[J].J Voice,2024.

[14]ALVES M,SILVA G,BISPO B C, et al.Voice disorders detection through multiband cepstral features of sustained vowel[J].J Voice,2023,37(3):322-331.