哈爾濱工業(yè)大學(xué)深圳-模式識(shí)別-2017-考試重要知識(shí)點(diǎn)
《哈爾濱工業(yè)大學(xué)深圳-模式識(shí)別-2017-考試重要知識(shí)點(diǎn)》由會(huì)員分享,可在線閱讀,更多相關(guān)《哈爾濱工業(yè)大學(xué)深圳-模式識(shí)別-2017-考試重要知識(shí)點(diǎn)(15頁(yè)珍藏版)》請(qǐng)?jiān)谘b配圖網(wǎng)上搜索。
?(?i | ?j) be the loss incurred for taking action ?i when the state of nature is ?j.action ?i assign the sample into any class-Conditional risk for i = 1,…,a ??cjjii xPxR1 )|()|(??Select the action ?i for which R(?i | x) is minimumR is minimum and R in this case is called the Bayes risk = best reasonable result that can be achieved!?ij :loss incurred for deciding ?i when the true state of nature is ?jgi(x) = - R(?i | x)max. discriminant corresponds to min. riskgi(x) = P(?i | x)max. discrimination corresponds to max. posteriorgi(x) ? p(x | ?i) P(?i) gi(x) = ln p(x | ?i) + ln P(?i)問題由估計(jì)似然概率變?yōu)楣烙?jì)正態(tài)分布的參數(shù)問題極大似然估計(jì)和貝葉斯估計(jì)結(jié)果接近相同,但方法概念不同Please present the basic ideas of the maximum likelihood estimation method and Bayesian estimation method. When do these two methods have similar results ?請(qǐng)描述最大似然估計(jì)方法和貝葉斯估計(jì)方法的基本概念。什么情況下兩個(gè)方法有類似的結(jié)果?I.Maximum-likelihood view the parameters as quantities whose values are fixed but unknown. The best estimate of their value is defined to be the one that maximizes the probability of obtaining the samples actually observed.II.Bayesian methods view the parameters as random variables having some known prior distribution. Observation of the samples converts this to a posterior density, thereby revising our opinion about the true values of the parameters.III.Under the condition that the number of the training samples approaches to the infinity, the estimation of the mean obtained using Bayesian estimation method is almost identical to that obtained using the maximum likelihood estimation method.最小風(fēng)險(xiǎn)決策通常有一個(gè)更低的分類準(zhǔn)確度相比于最小錯(cuò)誤率貝葉斯決策。然而,最小風(fēng)險(xiǎn)決策能夠避免可能的高風(fēng)險(xiǎn)和損失。貝葉斯參數(shù)估計(jì)方法。Vectorize the samples.Calculation of the mean of all training samples.Calculation of the covariance matrixCalculation of eigenvectors and eigenvalue of the covariance matrix. Build the feature space.Feature extraction of all samples. Calculation the feature value of every sample.Calculation of the test sample feature value.Calculation of the samples of training samples like the above step.Find the nearest training sample as the result. Exercises1. How to use the prior and likehood to calculate the posterior ? What is the formula ?怎么用先驗(yàn)概率和似然函數(shù)計(jì)算后驗(yàn)概率?公式是什么?P(?j | x) = p(x | ?j) . P(?j) / p(x), ??1)(jP?1)|(xj2. What’s the difference in the ideas of the minimum error Bayesian decision and minimum risk Bayesian decision? What’s the condition that makes the minimum error Bayesian decision identical to the minimum risk Bayesian decision?最小誤差貝葉斯決策和最小風(fēng)險(xiǎn)貝葉斯決策的概念的差別是什么?什么情況下最小誤差貝葉斯決策和最小風(fēng)險(xiǎn)貝葉斯決策是一致的(相同的)?答:在兩類問題中,若有 ,即所謂對(duì)稱損失函數(shù)的情況,則這時(shí)最小1221????風(fēng)險(xiǎn)的貝葉斯決策和最小誤差的貝葉斯決策方法顯然是一致的。the minimum error Bayesian decision: to minimize the classification error of the Bayesian decision. the minimum risk Bayesian decision: to minimize the risk of the Bayesian decision. if R(?1 | x) < R(?2 | x) action ?1: “decide ?1” is takenR(?1 | x) = ??11P(?1 | x) + ?12P(?2 | x)R(?2 | x) = ??21P(?1 | x) + ?22P(?2 | x) 3. A person takes a lab test of nuclear radiation and the result is positive. The test returns a correct positive result in 99% of the cases in which the nuclear radiation is actually present, and a correct negative result in 95% of the cases in which the nuclear radiation is not present. Furthermore, 3% of the entire population are radioaetively eontaminated. Is this person eontaminated?一人在某實(shí)驗(yàn)室做了一次核輻射檢測(cè),結(jié)果是陽(yáng)性的。當(dāng)核輻射真正存在時(shí),檢測(cè)結(jié)?2(|()(jj jjxp果返回正確的陽(yáng)性概率是 99%;當(dāng)核輻射不存在時(shí),結(jié)果返回正確的陰性的概率是 95%。而且,所有被測(cè)人群中有 3%的人確實(shí)被輻射污染了。那么這個(gè)人被輻射污染了嗎?答: 被輻射污染概率 1()0.3P??未被輻射污染概率 297X 表示陽(yáng)性, 表示陰性,則有如下結(jié)論:,1(|)0.9P?。2|5?則 112(|)(0.93(|) 0.38.(1.5).97iiiXP?? ??????21(|)(|)0.62P??根據(jù)貝葉斯決策規(guī)則有:21(|)(|)X?所以這個(gè)人未被輻射污染。4. Please present the basic ideas of the maximum likehood estimation method and Bayesian estimation method. When do these two methods have similar results ?請(qǐng)描述最大似然估計(jì)方法和貝葉斯估計(jì)方法的基本概念。什么情況下兩個(gè)方法有類似的結(jié)果?答:I. 設(shè)有一個(gè)樣本集 ,要求我們找出估計(jì)量 ,用來估計(jì) 所屬總體分布的某個(gè)????真實(shí)參數(shù) 使得帶來的貝葉斯風(fēng)險(xiǎn)最小,這就是貝葉斯估計(jì)的概念。?(另一種說法:把待估計(jì)的參數(shù)看成是符合某種先驗(yàn)概率分布的隨機(jī)變量;對(duì)樣本進(jìn)行觀測(cè)的過程,就是把先驗(yàn)概率密度轉(zhuǎn)化為后驗(yàn)概率密度,這樣就利用樣本的信息修正了對(duì)參數(shù)的初始估計(jì)值)II. 最大似然估計(jì)法的思想很簡(jiǎn)單:在已經(jīng)得到試驗(yàn)結(jié)果的情況下,我們應(yīng)該尋找使這個(gè)結(jié)果出現(xiàn)的可能性最大的那個(gè) 作為真 的估計(jì)。?III.在訓(xùn)練樣本數(shù)目接近無窮時(shí),使用貝葉斯估計(jì)方法獲得的平均值估計(jì)幾乎和使用最大似然估計(jì)的方法獲得的平均值一樣題外話:Prior + samplesI.Maximum-likelihood view the parameters as quantities whose vales are fixed but unknown. The best estimate of their value is defined to be the one that maximizes the probability of obtaining the samples actually observed.II.Bayesian methods view the parameters as random variables having some known prior distribution. Observation of the samples converts this to a posterior density, thereby revising our opinion about the true values of the parameters.III.Under the condition that the number of the training samples approaches to the infinity, the estimation of the mean obtained using Bayesian estimation method is almost identical to that obtained using the maximum likehood estimation method.5. Please present the nature of principal component analysis.請(qǐng)描述主成分分析法的本質(zhì)答:主成分分析也稱主分量分析,旨在利用降維的思想,把多指標(biāo)轉(zhuǎn)化為少數(shù)幾個(gè)綜合指標(biāo)。? Capture the component that varies the most.(變化最大 )? The component that varies the most contains main information of the samples(信息量最大)? We also say that PCA is the optimal representation method, which allows us to obtain the minimum reconstruction error.(最小重構(gòu)誤差)? As the transform axes of PCA are orthogonal, it is also referred to as an orthogonal transform method.(正交變換)? PCA is also a de-correlation method.(不相關(guān)法)? PCA can be also used as a compression method and is able to obtain a high compression ratio.(高壓縮比)6. Describe the basic idea and possible advantage of Fisher discriminant analysis. 描述 Fisher 判別分析的基本概念和可能的優(yōu)勢(shì)答:Fisher 準(zhǔn)則是典型的模式識(shí)別方法,它強(qiáng)調(diào)將線性方法中的法向量與樣本的乘積看做樣本向量在單位法向量上的投影。所獲得的結(jié)果與正態(tài)分布協(xié)方差矩陣等的貝葉斯決策結(jié)果類似,這說明如果兩類分布圍繞各自均值的確相近,F(xiàn)isher 準(zhǔn)則可使錯(cuò)誤率較小。SupervisedMaximize the between-class distance and minimize the within-class distanceExploit the training sample to produce transform axes.……(number of effective Fisher transform axes, c-1; how to avoid singular within-class scatter matrix---PCA+FDA)7. What is the K nearest neighbor classifier ? Is it reasonable ?什么是 K 近鄰分類器,它合理嗎?答: 近鄰法的基本思想是在測(cè)試樣本 x 的 k 個(gè)近鄰中,按出現(xiàn)最多的樣本類別來作為 x 的類別,即先對(duì) x 的 k 個(gè)近鄰一一找出它們的類別,然后最 x 類進(jìn)行判別。在 k 近鄰算法中,若樣本相對(duì)較稀疏,只按照前 k 個(gè)近鄰樣本的順序而不考慮其距離差別以決策測(cè)試樣本 x 的類別是不適當(dāng)?shù)?,尤其是?dāng) k 取值較大時(shí)。K nearest neighbor classifier view satisfy the k nearest neighbor rule ,the rule classifies x by assigning it the label most fequently represented among the k nearest samples; in other words, a decision is made b examining the labels on the k nearest neighbors and taking a vote.8. Is it possible that a classifier can obtain a higher accuracy for any dataset than any other classifier? 一個(gè)分類器比其他分類器在任何數(shù)據(jù)集上都能獲得更高的精度,可能嗎?答:顯然不可能的。這個(gè)理由很多。NO,9. Please describe the over-fitting problem.請(qǐng)描述過度擬合的問題答:過擬合:為了得到一致假設(shè)而使假設(shè)變得過度復(fù)雜稱為過擬合。想像某種學(xué)習(xí)算法產(chǎn)生了一個(gè)過擬合的分類器,這個(gè)分類器能夠百分之百的正確分類樣本數(shù)據(jù)(即再拿樣本中的文檔來給它,它絕對(duì)不會(huì)分錯(cuò)) ,但也就為了能夠?qū)颖就耆_的分類,使得它的構(gòu)造如此精細(xì)復(fù)雜,規(guī)則如此嚴(yán)格,以至于任何與樣本數(shù)據(jù)稍有不同的文檔它全都認(rèn)為不屬于這個(gè)類別!過擬合問題就是分類器分的太細(xì)了,太具體,Over-fitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model which has been over-fit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data.10. Usually a more complex learning algorithm can obtain a higher accuracy in the training stage. So, should a more complex learning algorithm be favored ?通常一個(gè)更復(fù)雜的學(xué)習(xí)算法在訓(xùn)練階段能獲得更高的精度。那么我就該選擇更復(fù)雜的學(xué)習(xí)算法嗎?答:不No context-independent or usage-independent reasons to favor one learning or classification method over another to obtain good generalization performance.When confronting a new pattern recognition problem, we need focus on the aspects — prior information, data distribution, amount of training data and cost or reward functions.Ugly Duckling Theorem: an analogous theorem, addresses features and patterns. shows that in the absence of assumptions we should not prefer any learning or classification algorithm over another.11. Under the condition that the number of the training samples approaches to the infinity, the estimation of the mean obtained using Bayesian estimation method is almost identical to that obtained using the maximum likehood estimation method. Is this statement correct ?在訓(xùn)練樣本數(shù)目接近無窮時(shí),使用貝葉斯估計(jì)方法獲得的平均值估計(jì)幾乎和使用最大似然估計(jì)的方法獲得的平均值一樣。這種情況正確嗎?答:理由同第 4 題,沒找到。YES12. Can the minimum squared error procedure be used for binary classification ? 最小平方誤差方法能用于 2 維數(shù)據(jù)的分類嗎答:略Yes, the minimum squared error procedure can be used for binary classification., .bYa??????????????????idiTinyyY,..01A simple way to set : if is from the first class, then is set to 1; if is from the biYibiYsecond class, then is set to -1.iAnother simple way to set : if is from the first class, then is set to ; if is from bi ib1nithe second class, then is set to - .i2n13. Can you devise a minimum squared error procedure to perform multiclass classification ? 你能設(shè)計(jì)出一個(gè)能多級(jí)別識(shí)別的最小平方誤差方法嗎?14. Which kind of applications is the Markov model suitable for ?Markov 模型適合哪類應(yīng)用?答:Markov model has found greatest use in such problems, for instance speech recognition or gesture recognition.(語(yǔ)音、手勢(shì)識(shí)別)? The evaluation problem? The decoding problem? The learning problem?????????????ndnndbay.....210102211015. For minimum squared error procedure based on Ya=b (Y is the matrix consisting of all the training samples), if we have proper b and criterion function, then this minimum squared error procedure might be equivalent to Fisher discriminant analysis. Is this presentation correct ?對(duì)于基于 Ya=b 的最小平方誤差方法,如果我們有合適的 b 和判別函數(shù),那么最小平方誤差方法就會(huì)和 Fisher 判別方法等價(jià)。這么說對(duì)嗎?答:中文書 198 頁(yè),英文書 pdf 的 289 頁(yè),章節(jié) 5.8.2。豆丁上的課件 16. Suppose that the number of the training samples approaches to the infinity, then the minimum error Bayesian decision will perform better than any other classifier achieving a lower classification error rate. Do you agree on this ?假設(shè)訓(xùn)練樣本的數(shù)目接近無窮,那么最小誤差貝葉斯決策會(huì)比其他分類器的分類誤差率更小。你同意這種觀點(diǎn)嗎?答:待定17. What are the upper and lower bound of the classification error rate of the K nearest neighbor classifier ?K 近鄰方法的分類誤差上界與下界是什么?答:不同 k 值的 k 近鄰法錯(cuò)誤率不同, k=1 時(shí)為最近鄰法的情況(上、下界分別為貝葉斯錯(cuò)誤率 P*和 ) 。當(dāng) k 增加時(shí),上限逐漸靠近下限 ---貝葉斯錯(cuò)誤率 P*。當(dāng) k*(2)1c?趨于無窮時(shí),上下限重合,P= P*,此時(shí) k 近鄰法已趨于貝葉斯決策方法達(dá)到最優(yōu)。The Bayes rate is p* , the lower bound on p is p* itself.The upper bound is about twice the Bayes rate.s18. Can you demonstrate that a statistics-based classifier usually cannot lead to a classification accuracy of 100% ?你能演示下基于統(tǒng)計(jì)的分類器不能導(dǎo)致 100%的準(zhǔn)確度嗎?19. What is representation-based classification? Please present the characteristics of representation-based classification.基于表征的分類是什么?請(qǐng)給出基于表征分類的特點(diǎn)?20. A simple representation-based classification method is presented as follows:一個(gè)簡(jiǎn)單的基于表征的分類方法如下This method seeks to represent the test sample as a linear combination of all training samples and uses the representation result to classify the test sample:這個(gè)方法尋求使用訓(xùn)練樣本線性組合方法來表達(dá)測(cè)試樣本,而且使用表征結(jié)果來分類測(cè)試樣本:, (1) Mxby~.1??where ( ) denote all the training samples and ( ) are the ix2, ibM,.21?coefficients. We rewrite Eq.(1) into , (2) BXy~?where , . If is not singular, we can solve using TMb].[1]~[1Mx?XB; otherwise, we can solve it using yT)(?, (3) XIBT~1????where is a positive constant and is the identity matrix. After we obtain , we refer to as ?I BX~the representation result of our method. We can convert the representation result into a two-dimensional image having the same size of the original sample image.We exploit the sum of the contribution, to representing the test sample, of the training samples from a class, to classify the test sample. For example, if all the training samples from the th ( ) class are , then the sum of the contribution, to representing the test rC?tsx~.sample, of the th class will be r. (4) tsraxg.~??We calculate the deviation of from usingrgy. (5)CyDrr??,||2We can also convert into a two-dimensional matrix having the same size of the original sample rgimage. If we do so, we refer to the matrix as the two-dimensional image corresponding to the contribution of the th class. The smaller the deviation , the greater the contribution to rDrepresenting the test sample of the th class. In other words, if ( ), the test r rqmin?C?,sample will be classified into the th class. qFrom the above presentation, we know that representation-based classification method is a novel method and totally different from previous classifiers ! It performs very well in image-based classification, such as face recognition and palmprint recognition. We should understand its nature and advantages. 21. Please describe the difference between linear and nonlinear discriminant functions? What potential advantage does nonlinear discriminant function have in comparison with linear discriminant function?請(qǐng)描述線性非線性判別函數(shù)的差別?非線性判別函數(shù)和線性判別函數(shù)比較有什么潛在的優(yōu)勢(shì)?答:I. 簡(jiǎn)單的說線性判別函數(shù)就是其函數(shù)圖形是直線、平面,非線性判別函數(shù)則相反,函數(shù)圖形是曲線、曲面,不是直線、平面。II.在實(shí)際中有許多模式識(shí)別問題并不是線性可分的,應(yīng)采用非線性分類器進(jìn)行設(shè)計(jì)。例如當(dāng)兩類樣本分布具有多峰性質(zhì)并互相交錯(cuò)時(shí),簡(jiǎn)單的線性判別函數(shù)往往會(huì)帶來較大的分類錯(cuò)誤。The above figure is just auxiliary for the question ! 22. What is the na?ve Bayes rule ?什么是樸素貝葉斯準(zhǔn)則答:樸素貝葉斯分類是一種十分簡(jiǎn)單的分類算法,叫它樸素貝葉斯分類是因?yàn)檫@種方法的思想真的很樸素,樸素貝葉斯的思想基礎(chǔ)是這樣的:對(duì)于給出的待分類項(xiàng),求解在此項(xiàng)出現(xiàn)的條件下各個(gè)類別出現(xiàn)的概率,哪個(gè)最大,就認(rèn)為此待分類項(xiàng)屬于哪個(gè)類別。通俗來說,就好比這么個(gè)道理,你在街上看到一個(gè)黑人,我問你你猜這哥們哪里來的,你十有八九猜非洲。為什么呢?因?yàn)楹谌酥蟹侵奕说谋嚷首罡?,?dāng)然人家也可能是美洲人或亞洲人,但在沒有其它可用信息下,我們會(huì)選擇條件概率最大的類別,這就是樸素貝葉斯的思想基礎(chǔ)。23. What is the difference between supervised and unsupervised learning methods? Please show two examples of supervised and unsupervised learning methods. 監(jiān)督學(xué)習(xí)方法和非監(jiān)督學(xué)習(xí)方法的差別是什么?請(qǐng)分別給出監(jiān)督學(xué)習(xí)方法和非監(jiān)督學(xué)習(xí)方法的例子?24. In some special real-world classification applications, the Bayesian decision theory might perform badly. What are possible reasons ?在一些特殊的真實(shí)世界分類的應(yīng)用中,貝葉斯決策理論可能表現(xiàn)很糟糕,可能的原因是什么?25. Suppose that we are applying a linear discriminant function to a nonlinear separable problem, what means can we adopt to obtain an optimal solution?假如我們將一個(gè)線性判別函數(shù)應(yīng)用到了一個(gè)非線性分割問題,為了獲得一個(gè)最優(yōu)解我們可以采取什么方法?26. Please present possible generalization capability in the sample space of a method. 請(qǐng)表達(dá)出在一個(gè)方法的樣本空間里的可能的泛化能力?27. Apply model Ya=b to perform classification.應(yīng)用 Ya=b 模型來實(shí)施分類。28. How to extend the binary minimum squared error procedure to the multiclass minimum squared error procedure? 怎么將 2 維最小平方誤差方法擴(kuò)展到多維最小誤差平方方法?- 1.請(qǐng)仔細(xì)閱讀文檔,確保文檔完整性,對(duì)于不預(yù)覽、不比對(duì)內(nèi)容而直接下載帶來的問題本站不予受理。
- 2.下載的文檔,不會(huì)出現(xiàn)我們的網(wǎng)址水印。
- 3、該文檔所得收入(下載+內(nèi)容+預(yù)覽)歸上傳者、原創(chuàng)作者;如果您是本文檔原作者,請(qǐng)點(diǎn)此認(rèn)領(lǐng)!既往收益都?xì)w您。
下載文檔到電腦,查找使用更方便
10 積分
下載 |
- 配套講稿:
如PPT文件的首頁(yè)顯示word圖標(biāo),表示該P(yáng)PT已包含配套word講稿。雙擊word圖標(biāo)可打開word文檔。
- 特殊限制:
部分文檔作品中含有的國(guó)旗、國(guó)徽等圖片,僅作為作品整體效果示例展示,禁止商用。設(shè)計(jì)者僅對(duì)作品中獨(dú)創(chuàng)性部分享有著作權(quán)。
- 關(guān) 鍵 詞:
- 哈爾濱工業(yè)大學(xué) 深圳 模式識(shí)別 2017 考試 重要 知識(shí)點(diǎn)
鏈接地址:http://m.kudomayuko.com/p-359694.html