Identification of Corrupted Data via $k$-Means Clustering for Function Approximation

Volume 2, Issue 1

Jun Hou, Yeonjong Shin & Dongbin Xiu

DOI: 10.4208/csiam-am.2020-0212

CSIAM Trans. Appl. Math., 2 (2021), pp. 81-107.

Published online: 2021-02

Preview Full PDF 2129 26033

Cited by

google scholar semantic scholar

Export citation

Abstract

In addition to measurement noises, real world data are often corrupted by unexpected internal or external errors. Corruption errors can be much larger than the standard noises and negatively affect data processing results. In this paper, we propose a method of identifying corrupted data in the context of function approximation. The method is a two-step procedure consisting of approximation stage and identification stage. In the approximation stage, we conduct straightforward function approximation to the entire data set for preliminary processing. In the identification stage, a clustering algorithm is applied to the processed data to identify the potentially corrupted data entries. In particular, we found $k$-means clustering algorithm to be highly effective. Our theoretical analysis reveals that under sufficient conditions the proposed method can exactly identify all corrupted data entries. Numerous examples are provided to verify our theoretical findings and demonstrate the effectiveness of the method.

Keywords

Data corruption, function approximation, sparse approximation, $k$-means clustering.

AMS Subject Headings

42C05, 41A10, 65D15

Email address

BibTex
RIS
TXT

@Article{CSIAM-AM-2-81, author = {Hou , JunShin , Yeonjong and Xiu , Dongbin}, title = {Identification of Corrupted Data via $k$-Means Clustering for Function Approximation}, journal = {CSIAM Transactions on Applied Mathematics}, year = {2021}, volume = {2}, number = {1}, pages = {81--107}, abstract = {

}, issn = {2708-0579}, doi = {https://doi.org/10.4208/csiam-am.2020-0212}, url = {http://global-sci.org/intro/article_detail/csiam-am/18655.html} }

TY - JOUR T1 - Identification of Corrupted Data via $k$-Means Clustering for Function Approximation AU - Hou , Jun AU - Shin , Yeonjong AU - Xiu , Dongbin JO - CSIAM Transactions on Applied Mathematics VL - 1 SP - 81 EP - 107 PY - 2021 DA - 2021/02 SN - 2 DO - http://doi.org/10.4208/csiam-am.2020-0212 UR - https://global-sci.org/intro/article_detail/csiam-am/18655.html KW - Data corruption, function approximation, sparse approximation, $k$-means clustering. AB -

Jun Hou, Yeonjong Shin & Dongbin Xiu. (2021). Identification of Corrupted Data via $k$-Means Clustering for Function Approximation. CSIAM Transactions on Applied Mathematics. 2 (1). 81-107. doi:10.4208/csiam-am.2020-0212

Copy to clipboard

BibteX RIS TXT

The citation has been copied to your clipboard

- LOGIN -

- E-mail verification -

- REGISTER -