The Learning Curve Historical Review and Comprehensive Survey Filetypepdf

BMC Med Inform Decis Mak. 2012; 12: 8.

Predicting sample size required for classification performance

Rosa Fifty Figueroa

¹Dep. Ing. Eléctrica, Facultad de Ingeniería, Universidad de Concepción, Concepción, Republic of chile

Qing Zeng-Treitler

²Department of Biomedical Computer science, Academy of Utah, Salt Lake City, Utah, USA

Sasikiran Kandula

²Department of Biomedical Informatics, Academy of Utah, Salt Lake Metropolis, Utah, Us

Long H Ngo

³Section of Medicine, Beth Israel Deaconess Medical Heart and Harvard Medical School, Boston, MA, USA

Received 2011 Jun 30; Accepted 2012 February xv.

Supplementary Materials: Additional file 1 Appendix1 is a PDF file with the main lines of R code that implements curve plumbing equipment using inverse ability models.

GUID: 9A9C2ABD-D680-45F5-9262-887E2266C5A4

Additional file 2 Appendix ii is a PDF file that contains more details virtually the agile learning methods used to generate the learning curves.

GUID: 25530CEF-A5BD-4849-8465-578086AD60F1

Abstract

Background

Supervised learning methods need annotated data in order to generate efficient models. Annotated data, all the same, is a relatively scarce resource and can be expensive to obtain. For both passive and agile learning methods, at that place is a demand to judge the size of the annotated sample required to reach a performance target.

Methods

We designed and implemented a method that fits an inverse power police model to points of a given learning curve created using a small annotated training set. Plumbing equipment is carried out using nonlinear weighted least squares optimization. The fitted model is and then used to predict the classifier'due south performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted bend fitting method was practical to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method.

Results

A full of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data prepare and sampling method, information technology took between 80 to 560 annotated samples to achieve mean boilerplate and root mean squared error beneath 0.01. Results also show that our weighted plumbing equipment method outperformed the baseline un-weighted method (p < 0.05).

Conclusions

This newspaper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an united nations-weighted algorithm described in previous literature. Information technology can help researchers make up one's mind annotation sample size for supervised machine learning.

Background

The availability of biomedical data has increased during the past decades. In order to procedure such data and excerpt useful information from it, researchers have been using motorcar learning techniques. However, to generate predictive models, the supervised learning techniques need an annotated grooming sample. Literature suggests that the predictive power of the classifiers is largely dependent on the quality and size of the grooming sample [ane-six].

Man annotated data is a scarce resource and its creation expensive both in terms of money and time. For example, united nations-annotated clinical notes are abundant. To label un-annotated text corpora from the clinical domain, withal, requires a grouping of reviewers with domain expertise and only a tiny fraction of the available clinical notes tin be annotated.

The procedure of creating an annotated sample is initiated past selecting a subset of information; the question is: what should the size of the grooming subset be to reach a certain target classification performance? Or to phrase information technology differently: what is the expected classification performance for a given training sample size?

Problem formulation

Our interest in sample size prediction stemmed from our experiments with active learning. Active learning is a sampling technique that aims to minimize the size of the preparation set for classification. The principal goal of active learning is to accomplish, with a smaller training set up, a performance comparable to that of passive learning. In the iterative procedure, users need to make a determination on when to stop/proceed the data labeling and classification process. Although termination criteria is an issue for both passive and active learning, identifying an optimal termination point and training sample size may be more important in active learning. This is because the passive and active learning curves volition, given a sufficiently large sample size, eventually converge and thus diminish the advantage of active learning over passive learning. Relatively few papers take been published on the termination criteria for agile learning [7-9]. The published criteria are generally based on target accurateness, classifier confidence, doubtfulness estimation, and minimum expected error. As such, they do not directly predict a sample size. In addition, depending on the algorithm and classification, active learning algorithms differ in performance and sometimes can perform even worse than passive learning. In our prior work on medical text classification, nosotros take investigated and experimented with several active learning sampling methods and observed the need to predict hereafter classification performance for the purpose of selecting the best sampling algorithm and sample size[x,11]. In this paper we present a new method that predicts the operation at an increased sample size. This method models the observed classifier performance as a office of the training sample size, and uses the fitted curve to forecast the classifier's future behaviour.

Previous and related work

Sample size determination

Our method can be viewed every bit a blazon of sample size determination (SSD) method that determines sample size for written report design. There are a number of different SSD methods to meet researchers' specific data requirements and goals [12-14]. Determining the sample size required to achieve sufficient statistical ability to reject a null hypothesis is a standard approach [13-xvi]. Cohen defines statistical ability as the probability that a test will "yield statistically significant results" i.e. the probability that the null hypothesis volition exist rejected when the alternative hypothesis is true[17]. These SSD methods have been widely used in bioinformatics and clinical studies [15,18-21]. Another methods endeavor to find the sample size needed to reach a target functioning (e.chiliad. a high correlation coefficient) [22-25]. Inside this category we discover methods that predict the sample size required for a classifier to attain a detail accuracy [2,4,26]. There are two main approaches to predict the sample size required to achieve a specific classifier performance: Dobbin et al. describe a "model-based" arroyo to predict the number of samples needed for classifying microarray information [two]. It determines sample size based on standardized fold change, class prevalence, and number of genes or features on the arrays. Some other more generic arroyo is to fit a classifier's learning curve created using empirical information to inverse ability police force models. This approach is based on the findings from prior studies where it was shown that the learning classifier learning curves generally follow the inverse power law [27]. Examples of this approach include the algorithms proposed by Mukherjee and others [ane,28-thirty]. Since our proposed method is a variant of this arroyo, nosotros will depict the prior work on learning curve fitting in more than detail.

Learning curve fitting

A learning curve is a collection of data points (x_j, y_j) that in this case describe how the performance of a classifier (y_j) is related to preparation sample sizes (x_j), where j = ane to m, m being the total number of instances. These learning curves can typically be divided into three sections: In the first section, the classification performance increases rapidly with an increment in the size of the grooming set; the 2nd section is characterized past a turning indicate where the increase in performance is less rapid and a concluding department where the classifier has reached its efficiency threshold, i.e. no (or simply marginal) comeback in performance is observed with increasing training set size. Effigy i is an case of a learning curve.

An external file that holds a picture, illustration, etc. Object name is 1472-6947-12-8-1.jpg

Mukherjee et al. experimented with plumbing fixtures inverse ability laws to empirical learning curves to forecast the functioning at larger sample sizes [1]. They have besides discussed a permutation examination procedure to appraise the statistical significance of classification performance for a given dataset size. The method was tested on several relatively small microarray data sets (n = 53 to 280). The differences between the predicted and bodily nomenclature errors were found to be in the range of i%-7%. Boonyanunta et al. on the other hand conducted the curve fitting on several much larger datasets (north = 1,000) using a nonlinear model consistent with the changed power police [28]. The mean accented errors were very small, generally beneath 1%. Our proposed method is similar to that discussed in Mukherjee et al. with a couple of differences: 1) we conducted weighted curve plumbing fixtures to favor time to come predictions; 2) we calculated the conviction interval for the fitted bend rather than fitting 2 additional curves for the lower and upper quartile information points.

Progressive sampling

Another research surface area related to our piece of work is progressive sampling. Both active learning and progressive sampling start with a very modest batch of instances and progressively increment the training data size until a termination criteria is met [31-36]. Active learning algorithms seek to select the nigh informative cases for training. Several of the learning curves used in this paper were generated using active learning techniques. Progressive sampling, on the other hand, focuses more on minimizing the amount of computation for a given performance target. For instance, Provost et al. proposed progressive sampling using a geometric progression-based sampling schedule [31]. They also explored convergence detection methods for progressive sampling and selected a convergence method that used linear regression with local sampling (LRLS). In LRLS, the gradient of a linear regression line that has been built with r points sampled around the neighborhood of the last sample size is compared to zippo. If it is close plenty to zero, convergence is detected. The chief departure between progressive sampling and SSD of classifiers is that progressive sampling assumes there are an unlimited number of annotated samples and does not predict the sample size required to reach a specific performance target.

Methods

In this section nosotros depict a new fitting algorithm to predict classifier functioning based on a learning bend. This algorithm fits an changed power law model to a small set of initial points of a learning curve with the purpose of predicting a classifier'south performance at larger sample sizes. Evaluation was carried out on 12 learning curves at dozens of sample sizes for model fitting and predictions were validated using standard goodness of fit measures.

Algorithm clarification

The algorithm to model and predict a classifier'south performance contains iii steps:

1) Learning curve creation;

2) Model fitting;

3) Sample size prediction;

Learning curve creation

Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Y_acc), as a function of the training set size (Ten) is created. To obtain the data points (x_j, y_j), classifiers are created and tested at increasing grooming set up sizes x_j . With a batch size 1000, x _j = 1000·j, j = 1, 2,...,k, i.due east. ${\vec{10}}_{j} = {k, two k, three one thousand, . . ., k \cdot m}$ . Nomenclature accuracy points (y_j), i.due east. the proportion of correctly classified samples, tin be calculated at each preparation sample sizex_j using an independent test set or through n-fold cantankerous validation.

Model plumbing fixtures and parameter identification

Learning curves can generally be represented using changed power law functions [1,27,37,38]. Equation (i) describes the classifier'due south accurateness (Y_acc) equally function of the training sample size × with the parameters a, b, and c representing the minimum doable fault, learning charge per unit and decay charge per unit respectively. The values of the parameters are expected to differ depending on the dataset, sampling method and the classification algorithm. Still, values for parameter c are expected to be negative within the range [-one,0]; values for a are expected to be much smaller than i. The values of Y_accfall between 0 and 1. Y_accgrows asymptotically to the maximum achievable operation, in this case (1-a).

Y _{a c c}(x) =f(Ten;a,b,c) = (i -a) -b ⋅10 ^c

(1)

Permit us ascertain the prepare Ωas the collection of information points on an empirical learning corresponding to (X,Y _{a c c _X}). Ω can be partitioned into two sub-sets: Ω _t to fit the model, and Ω _t to validate the fitted model. Delight note that in existent life applications only Ω _t will be available. For case, at sample size x_sΩ _t = {(tenj, y_j )| x_j ≤ x_s } and Ω _v = {(xj, y_j )| x_j >x_s }.

UsingΩ _t , nosotros applied nonlinear weighted least squares optimization together with the nl2sol routine from Port Library[39] to fit the mathematical model from Eq(i) and find the parameter vector $\vec{β}$ = {a, b, c}.

Nosotros also assigned weights to the data points inΩ _t . As described earlier, information points on the learning curve associates with sample sizes; we postulated that the classifier operation at a larger training sample size is more indicative of the classifier'due south future performance. To business relationship for this, a data indicate (ten_j, y_j )∈Ω _t is assigned the normalized weight j/grand where m is the cardinality of Ω.

Performance prediction

In this footstep, the mathematical model (Eq.(1)) together with the estimated parameters {a, b, c} are applied to unseen sample sizes and the resulting prediction is compared with the data points in Ω _five . In other words, the fitted curve is used to extrapolate the classifier'south performance at larger sample sizes. Additionally, the 95% confidence interval of the estimated accuracyŷ _s is also calculated past using Hessian matrix and the 2d-order derivatives on the office describing the curve. See appendix1 (additional file 1) for more details on the implementation of the methods.

Evaluation

Datasets

We evaluated our algorithm using 3 sets of data. In the first ii sets (D1 and D2), observations are smoking-related sentences from a set of patient discharge summaries from the Partners Health Care'south enquiry patient data repository (RPDR). Each observation was manually annotated with smoking status. D1 contains vii,016 sentences and 350 word features to distinguish between smokers (5,333 sentences) and non smokers (1,683 sentences). D2 contains 8,449 sentences, 350 word features to discriminate between past smokers (v,109 sentences) and current smokers (3,340 sentences).

The 3rd data ready (D3) is the waveform-5000 dataset from the UCI automobile learning repository [40] which contains 5,000 instances, 21 features and iii classes of waves (1657 instances of w1, 1647 of w2, and 1696 of w3). The classification goal is to perform binary classification to discriminate the get-go grade of waves from the other two.

Each dataset was randomly separate into a preparation set up and a testing set. Exam sets for D1 and D2 contained 1,000 instances each while 2,500 instances were set apart as test gear up in D3. On the 3 datasets, nosotros used 4 unlike sampling methods - three active learning algorithms and a random selection (passive) - together with a support vector machine classifier with linear kernel from WEKA [41] (complexity constant was set to i, epsilon set to one,0 E-12, tolerance parameter i,0E-3, and normalization/standardization options were turned off) to generate a full of 12 actual learning curves for Y_acc. The active learning methods used are:

• Distance (DIST), a elementary margin method which samples preparation instances based on their proximity to a back up vector machine (SVM) hyperplane;

• Variety (DIV) which selects instances based on their diversity/dissimilarity from instances in the training fix. Diversity is measured equally the simple cosine distance between the candidate instances and the already selected set of instances in lodge to reduce information redundancy; and

• Combined method (CMB) which is a combination of both DIST and DIV methods.

The initial sample size is set up to sixteen with an increase size of 16 likewise, i.e. k = xvi. Detailed information about the three algorithms can be constitute in appendix 2 (see additional file 2) and in literature [10,35,42].

Each experiment was repeated 100 times and Y _accaveraged at each batch size over the 100 runs to obtain data points(ten_j, y_j ) of the learning curve.

Goodness of fit measures

Two goodness of fit measurements, mean absolute error (MAE) (Eq.(2)) and root mean squared error (RMSE) (Eq.(3)), were used to evaluate the fitted part onΩ _v . MAE is the boilerplate absolute value of the difference between the observed accurateness (y_j ) and the predicted accuracy ( ${\overset{⌢}{y}}_{j}$ ). RMSE is the boilerplate of the foursquare root values of the difference between the observed accuracy (y_j ) and the predicted accuracy ( ${\overset{⌢}{y}}_{j}$ ). RMSE and MAE values of close to zero indicate a amend fit. Using ||Ω _five ||to stand for the cardinality of Ω_v, MAE and RMSE are computed as follows:

$M A Eastward = \frac{i}{∣ Ω_{v} ∣} \sum_{(x_{j}, y_{j}) \in Ω_{v}}^{g} ∣ y_{j} - {\overset{\land}{y}}_{j} ∣, \forall (10_{j,} y_{j}) \in Ω_{five}$

(two)

$R M S E = \sqrt{\frac{\sum_{(x_{j}, y_{j}) \in Ω_{v}}^{m} {(y_{j} - \overset{\land}{y_{j}})}^{2}}{∣ Ω_{v} ∣}}, \forall (10_{j}, y_{j}) \in Ω_{v}$

(three)

On each curve, we started the curve fitting and prediction experiment at |Ω _t | = 5, i.e. at the sample size of 80 instances. In the subsequent experiments, the |Ω _t | was increased by 1 until it reached 62 points, i.e. at the sample size of 992 instances.

To evaluate our method, we used as baseline the not-weighted least squares optimization algorithm described by Mukherjee et al [one]. Paired t-test was used to compare the RMSE and MAE between both methods for all experiments. The alternative hypothesis is that the means of the RMSE and MAE of the baseline method is greater than those of our weighted fitting method.

Results

Using the 3 datasets and four sampling methods, 12 actual learning curves are generated. Nosotros fitted the changed power law model to each of the curves, using an increasing number of data points (grand = 80-992 in D1 and D2, m = 80-480 in D3). A total of 568 experiments were conducted. In each experiment, the predicted performance was compared to the actual observed operation.

Effigy 2 shows the curve fitting and prediction results for the random sampling learning curve using D2 data at different sample sizes. In Figure 2a the curve was fitted using half dozen data points; the predicted bend (blue) deviates slightly from the actual data points (black), though the bodily information points do autumn in the relatively big confidence interval (cherry). As expected, the deviation and confidence interval are both larger as nosotros project further into the larger sample sizes. In 2b, with 11 data points for fitting, the predicted curve closely resembles the observed data and the confidence interval is much narrower. In 2c with 22 data points, the predicted curve is even closer to the actual observations with a very narrow confidence interval.

An external file that holds a picture, illustration, etc. Object name is 1472-6947-12-8-2.jpg

Progression of online bend plumbing fixtures for learning bend of the dataset D2-RAND.

Figure 3 illustrates the width of the confidence interval and MAE at various sample sizes. When the model is fitted with a pocket-sized number of annotated samples, nosotros tin detect that the confidence interval width and MAE in near of the cases accept larger values. Equally the sample size increases and the prediction accuracy improves, both confidence interval width and MAE values get smaller within a couple of exceptions. At large sample sizes, conviction intervals are very narrow and residual values very small. Both Figures 2 and 3 advise that the confidence interval width relates to MAE and prediction accurateness.

An external file that holds a picture, illustration, etc. Object name is 1472-6947-12-8-3.jpg

Progression of conviction interval width and MAE for predicted values.

Similarly, Figure 4 shows RMSE for the predicted values on the 12 learning curves with gradually increasing sample sizes used for curve fitting. Regarding fitting samples sizes, we tin observe a rapid decrease in RMSE and MAE from lxxx to 200 instances. From 200 to the end of the curves, values stay relatively constant and shut to nix with a few exceptions. The smallest MAE and RMSE were obtained from the D3 dataset on all the learning curves, followed past the learning curves on the D2 dataset. For all datasets RMSE and MAE have similar values with RMSE sometimes existence slightly larger.

An external file that holds a picture, illustration, etc. Object name is 1472-6947-12-8-4.jpg

RMSE for predicted values on the three datasets.

On Figure ii and 5, information technology tin can be observed that the width of the observed conviction intervals changes but slightly along the learning curves, showing that performance variance among experiments are not strongly impacted by the sample size. On the other hand, the predicted conviction interval narrows dramatically as more samples are used and the prediction becomes more than authentic.

An external file that holds a picture, illustration, etc. Object name is 1472-6947-12-8-5.jpg

Progression of confidence interval widths for the observed values (training set) and the predicted values.

We also compared our algorithm with the un-weighted algorithm. Table 1 shows average values of RMSE for the baseline un-weighted and our weighted method; min and max values are too provided. In all cases, our weighted fitting method had lower RMSE than baseline method with the exception of i tie. We pooled the RMSE values and conducted a paired t-test. The difference betwixt the weighted fitting method and the baseline method is statistically significant (p < 0.05). We conducted a similar analysis comparison the MAE between the two methods and obtained similar results.

Table 1

Average RMSE (%) for baseline and weighted fitting method.

	Boilerplate RMSE (%)
	Weighted [min-max]	Baseline [min-max]	P

D1-DIV	1.52 [0.04 - 8.44]	two.57 [0.82 - 8.70]	ii.7E-44

D1-CMB	0.60 [0.06 - 4.61]	1.xv [0.44 - 4.94]	2.7E-32

D1-DIS	0.61 [0.09 - 5.25]	1.16 [0.22 - v.50]	1.9E-22

D1-RND	one.15 [0.10 - eleven.37]	ii.01 [0.38 - eleven.29]	viii.2E-19

D2-DIV	1.33 [0.28-3.95]	1.63 [0.73-iii.53]	4.6E-09

D2-CMB	0.29 [0.01-0.67]	0.38 [0.19-0.76]	3.3E-04

D2-DIST	0.39 [0.04-ane.74]	0.fifty [0.22-2.eleven]	2.7E-03

D2-RND	0.46 [0.thirteen - four.99]	0.56 [0.16 - 4.44]	6.1E-04

D3-DIV	0.34 [0.05 - 1.22]	0.43 [0.04 - 0.93]	iv.6E-02

D3-CMB	0.47 [0.09 - 1.66]	0.65 [0.21 - ane.60]	6.0E-09

D3-DIS	0.38 [0.10 - 1.24]	0.49 [0.20 - 1.21]	v.1E-ten

D3-RND	0.32 [0.xv - one.48]	0.32 [0.eleven - 1.75]	6.3E-01

Paired Student's t-test conducted on the values of RMSE found the weighted fitting method statistically amend than the baseline method (p < 0.05).

Discussion

In this newspaper nosotros described a relatively simple method to predict a classifier'due south functioning for a given sample size, through the creation and modelling of a learning curve. As prior research suggests, the learning curves of machine classifiers generally follow the inverse-power law [1,27]. Given the purpose of predicting future operation, our method assigned higher weights to data points associated with larger sample size. In evaluation, the weighted methods resulted in more accurate prediction (p < 0.05) than the un-weighted method described past Mukherjee et al.

The evaluation experiments were conducted on gratuitous text and waveform information, using passive and active learning algorithms. Prior studies typically used a single type of data (e.g. microarray or text) and a single type of sampling algorithm (i.e. random sampling). By using a variety of information and sampling methods, we were able to test our method on a various collection of learning curves and assess its generalizability. For the bulk of curves, the RMSE fell below 0.01, inside a relative modest sample size of 200 used for bend fitting. We observed minimal differences between values of RMSE and MAE which indicates a low variance of the errors.

Our method also provides the confidence intervals of the predicted curves. As shown in Figure 2, the width of the confidence interval negatively correlates with the prediction accuracy. When the predicted value deviates more than from the actual observation, the confidence interval tends to be wider. As such, the confidence interval provides an boosted measure to aid users make the decision in selecting a sample size for boosted annotation and classification. In our written report, confidence intervals were calculated using a variance-covariance matrix on the fitted parameters. Prior studies have stated that the variance is not an unbiased estimator when a model is tested on new information [1]. Hence, our confidence intervals may sometimes be optimistic.

A major limitation of the methods is that an initial set of annotated information is needed. This is a shortcoming shared by other SSD methods for machine classifiers. On the other manus, depending on what confidence interval is accounted acceptable, the initial annotated sample tin can exist of moderate size (e.g. due north = 100~200).

The initial set of annotated information is used to create a learning curve. The curve contains

j data points with a starting sample size of grand₀ and a footstep size of g. The full sample size m = k₀+ (j-1)*m. The values of grand₀ and k are determined by users. When m₀ and k are assigned the same value, one thousand = j*k. In active learning, a typical experiment may assign m₀ equally xvi or 32 and 1000 as xvi or 32. For very small data sets, 1 may consider apply m₀ = 4 and k = iv. Empirically, we found that j needed to be greater than or equal to five for the bend fitting to be effective.

In many studies, as well as ours, the learning curves appear to be smooth considering each data point on the curve is assigned the boilerplate value from multiple experiments (due east.m. 10-fold cross validation repeated 100 times). With fewer experiments (e.g. one circular of training and testing per data betoken), the curve volition non be every bit smooth. We expect the model plumbing equipment to be more accurate and the confidence interval to be narrower on smoother curves, though the plumbing fixtures process remains the same for the less smooth curves.

Although the curve fitting tin can be done in real time, the time to create the learning curve depends on the classification task, batch size, characteristic number, processing time of the machine among others. The longest experiment we performed to create a learning curve using active learning as sample selection method run on a unmarried core laptop for several days, though most experiments needed just a few hours.

For futurity work, we intend to integrate the function to predict sample size into our NLP software. The purpose is to guide users in text mining and annotation tasks. In clinical NLP research, annotation is usually expensive and the sample size decision is often fabricated based on budget rather than expected performance. It is common for researchers to select an initial number of samples in an advertizing hoc fashion to annotate information and train a model. They then increase the number of annotations if the target functioning could not be reached, based on the vague only mostly correct belief that performance volition improve with a larger sample size. The amount of improvement though cannot be known without the modelling effort we draw in this paper. Predicting the nomenclature performance for a particular sample size would allow users to evaluate the cost effectiveness of additional annotations in report blueprint. Specifically, we program for it to be incorporated as part of an active learning and/or interactive learning process.

Conclusions

This paper describes a simple sample size prediction algorithm that conducts weighted plumbing equipment of learning curves. When tested on gratuitous text and waveform classification with agile and passive sampling methods, the algorithm outperformed the un-weighted algorithm described in previous literature in terms of goodness of fit measures. This algorithm can help users brand an informed conclusion in sample size selection for machine learning tasks, especially when annotated data are expensive to obtain.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

QZ and RLF conceived the report. SK and RLF designed and implemented experiments. SK and RLF analyzed data and performed statistical analysis. QZ and LN participated in report blueprint and supervised experiments and data analysis. RLF drafted the manuscript. Both SK and QZ had full admission to all of the data and made critical revisions to the manuscript. All authors read and approved the final manuscript.

Supplementary Material

Additional file 1:

Appendix1 is a PDF file with the master lines of R code that implements curve fitting using inverse power models.

Additional file two:

Appendix ii is a PDF file that contains more details well-nigh the agile learning methods used to generate the learning curves.

Acknowledgements

The authors wish to acknowledge CONICYT (Chilean National Council for Science and Technology Enquiry), MECESUP program, and Universidad de Concepcion for their back up to this research. This research was funded in part by CHIR HIR 08-374 and VINCI HIR-08-204.

References

Mukherjee S, Tamayo P, Rogers South, Rifkin R, Engle A, Campbell C, Golub TR, Mesirov JP. Estimating dataset size requirements for classifying Deoxyribonucleic acid microarray information. J Comput Biol. 2003;x(ii):119–142. doi: 10.1089/106652703321825928. [PubMed] [CrossRef] [Google Scholar]
Dobbin K, Zhao Y, Simon R. How Large a Training Set up is Needed to Develop a Classifier for Microarray Information? Clinical Cancer Research. 2008;xiv(ane):108–114. doi: 10.1158/1078-0432.CCR-07-0443. [PubMed] [CrossRef] [Google Scholar]
Tam VH, Kabbara S, Yeh RF, Leary RH. Impact of sample size on the performance of multiple-model pharmacokinetic simulations. Antimicrobial agents and chemotherapy. 2006;50(11):3950–3952. doi: 10.1128/AAC.00337-06. [PMC free commodity] [PubMed] [CrossRef] [Google Scholar]
Kim S-Y. Furnishings of sample size on robustness and prediction accurateness of a prognostic cistron signature. BMC bioinformatics. 2009;ten(one):147. doi: 10.1186/1471-2105-10-147. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
Kalayeh HM, Landgrebe DA. Predicting the Required Number of Training Samples. Design Analysis and Machine Intelligence, IEEE Transactions on. 1983;5(vi):664–667. [PubMed] [Google Scholar]
Nigam 1000, McCallum AK, Thrun S, Mitchell T. Text Classification from Labeled and Unlabeled Documents using EM. Mach Learn. 2000;39(2-3):103–134. [Google Scholar]
Vlachos A. A stopping benchmark for agile learning. Computer Voice communication and Linguistic communication. 2008;22(3):295–312. doi: x.1016/j.csl.2007.12.001. [CrossRef] [Google Scholar]
Olsson F, Tomanek K. Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Boulder, Colorado: Association for Computational Linguistics; 2009. An intrinsic stopping criterion for committee-based active learning; pp. 138–146. [Google Scholar]
Zhu J, Wang H, Hovy E, Ma M. Confidence-based stopping criteria for active learning for data note. ACM Transactions on Speech and Language Processing (TSLP) 2010;6(three):1–24. doi: 10.1145/1753783.1753784. [CrossRef] [Google Scholar]
Figueroa RL, Zeng-Treitler Q. Poster session presented at: AMIA 2009 Almanac Symposium in Biomedical and Health Computer science. San Francisco, CA, U.s.a.; 2009. Exploring Active Learning in Medical Text Classification. [Google Scholar]
Kandula Southward, Figueroa R, Zeng-Treitler Q. Affiche Session presented at: MEDINFO 2010 13th World Congress on MEdical Informatics. Cape Town, Due south Africa; 2010. Predicting Outcome Measures in Agile Learning. [Google Scholar]
Maxwell SE, Kelley K, Rausch JR. Sample size planning for statistical power and accuracy in parameter interpretation. Almanac review of psychology. 2008;59:537–563. doi: 10.1146/annurev.psych.59.103006.093735. [PubMed] [CrossRef] [Google Scholar]
Adcock CJ. Sample size determination: a review. Journal of the Imperial Statistical Social club: Serial D (The Statistician) 1997;46(ii):261–283. doi: 10.1111/1467-9884.00082. [CrossRef] [Google Scholar]
Lenth RV. Some Practical Guidelines for Effective Sample Size Determination. The American Statistician. 2001;55(3):187–193. doi: 10.1198/000313001317098149. [CrossRef] [Google Scholar]
Briggs AH, Gray AM. Ability and Sample Size Calculations for Stochastic Toll-Effectiveness Analysis. Medical Decision Making. 1998;eighteen(2):S81–S92. doi: 10.1177/0272989X9801800210. [PubMed] [CrossRef] [Google Scholar]
Carneiro AV. Estimating sample size in clinical studies: basic methodological principles. Rev Port Cardiol. 2003;22(12):1513–1521. [PubMed] [Google Scholar]
Cohen J. Statistical Power Analysis for the Behavioural Sciences. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988. [Google Scholar]
Scheinin I, Ferreira JA, Knuutila Southward, Meijer GA, van de Wiel MA, Ylstra B. CGHpower: exploring sample size calculations for chromosomal re-create number experiments. BMC bioinformatics. 2010;11:331. doi: 10.1186/1471-2105-11-331. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
Eng J. Sample size estimation: how many individuals should be studied? Radiology. 2003;227(2):309–313. doi: 10.1148/radiol.2272012051. [PubMed] [CrossRef] [Google Scholar]
Walters SJ. Sample size and power estimation for studies with health related quality of life outcomes: a comparison of four methods using the SF-36. Wellness and quality of life outcomes. 2004;2:26. doi: ten.1186/1477-7525-ii-26. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
Cai J, Zeng D. Sample size/power calculation for case-cohort studies. Biometrics. 2004;60(iv):1015–1024. doi: x.1111/j.0006-341X.2004.00257.10. [PubMed] [CrossRef] [Google Scholar]
Algina J, Moulder BC, Moser BK. Sample Size Requirements for Accurate Interpretation of Squared Semi-Partial Correlation Coefficients. Multivariate Behavioral Research. 2002;37(one):37–57. doi: ten.1207/S15327906MBR3701_02. [PubMed] [CrossRef] [Google Scholar]
Stalbovskaya Five, Hamadicharef B, Ifeachor E. Sample Size Determination using ROC Analysis. 3rd International Conference on Computational Intelligence in Medicine and Healthcare (CIMED2007): 2007. 2007.
Aggravate SL. Sample Size Conclusion for Conviction Intervals on the Population Mean and on the Departure Between Ii Population Means. Biometrics. 1989;45(3):969–977. doi: x.2307/2531696. [PubMed] [CrossRef] [Google Scholar]
Jiroutek MR, Muller KE, Kupper LL, Stewart PW. A New Method for Choosing Sample Size for Conviction Interval-Based Inferences. Biometrics. 2003;59(iii):580–590. doi: 10.1111/1541-0420.00068. [PubMed] [CrossRef] [Google Scholar]
Fukunaga K, Hayes R. Effects of sample size in classifier blueprint. Pattern Analysis and Auto Intelligence, IEEE Transactions on. 1989;xi(8):873–885. doi: 10.1109/34.31448. [CrossRef] [Google Scholar]
Cortes C, Jackel LD, Solla SA, Vapnik V, Denker JS. Learning Curves: Asymptotic Values and Charge per unit of Convergence. Six. San Francisco, CA. USA.: Morgan Kaufmann Publishers; 1994. [Google Scholar]
Boonyanunta Northward, Zeephongsekul P. Cognition-Based Intelligent Data and Applied science Systems. Vol. 3215. Springer Berlin/Heidelberg; 2004. Predicting the Relationship Between the Size of Preparation Sample and the Predictive Power of Classifiers; pp. 529–535. [CrossRef] [Google Scholar]
Hess KR, Wei C. Learning Curves in Classification With Microarray Data. Seminars in oncology. 2010;37(1):65–68. doi: 10.1053/j.seminoncol.2009.12.002. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
Last M. Proceedings of the Seventh IEEE International Briefing on Data Mining Workshops. IEEE Computer Society; 2007. Predicting and Optimizing Classifier Utility with the Power Law; pp. 219–224. [Google Scholar]
Provost F, Jensen D, Oates T. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. San Diego, California, United States: ACM; 1999. Efficient progressive sampling. [Google Scholar]
Warmuth MK, Liao J, Ratsch G, Mathieson Thousand, Putta South, Lemmen C. Active learning with support vector machines in the drug discovery process. J Chem Inf Comput Sci. 2003;43(two):667–673. doi: ten.1021/ci025620t. [PubMed] [CrossRef] [Google Scholar]
Liu Y. Active learning with back up vector machine applied to gene expression data for cancer nomenclature. J Chem Inf Comput Sci. 2004;44(6):1936–1941. doi: ten.1021/ci049810a. [PubMed] [CrossRef] [Google Scholar]
Li K, Sethi IK. Conviction-based active learning. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 2006;28(8):1251–1261. [PubMed] [Google Scholar]
Brinker K. Incorporating Diversity in Agile Learning with Support Vector Machines. Proceedings of the Twentieth International Conference on Auto Learning (ICML): 2003. 2003. pp. 59–66.
Yuan J, Zhou Ten, Zhang J, Wang Thousand, Zhang Q, Wang West, Shi B. Positive Sample Enhanced Bending-Diverseness Active Learning for SVM Based Image Retrieval. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2007): 2007. 2007. pp. 2202–2205.
Yelle LE. The Learning Curve: Historical Review and Comprehensive Survey. Conclusion Sciences. 1979;10(2):302–327. doi: 10.1111/j.1540-5915.1979.tb00026.x. [CrossRef] [Google Scholar]
Ramsay C, Grant A, Wallace S, Garthwaite P, Monk A, Russell I. Statistical assessment of the learning curves of health technologies. Health Engineering Assessment. 2001;5(12) [PubMed] [Google Scholar]
Dennis JE, Gay DM, Welsch RE. Algorithm 573: NL2SOL - An Adaptive Nonlinear To the lowest degree-Squares Algorithm [E4] ACM Transactions on Mathematical Software. 1981;7(3):369–383. doi: 10.1145/355958.355966. [CrossRef] [Google Scholar]
UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Weka---Machine Learning Software in Coffee. http://weka.wiki.sourceforge.net/
Tong S, Koller D. Support Vector Automobile Agile Learning with Applications to Text Classification. Journal of Machine Learning Research. 2001;2:45–66. [Google Scholar]

Manufactures from BMC Medical Informatics and Decision Making are provided here courtesy of BioMed Central

buttroseyounhand.blogspot.com

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307431/

The Learning Curve Historical Review and Comprehensive Survey Filetypepdf

Predicting sample size required for classification performance

Rosa Fifty Figueroa

Qing Zeng-Treitler

Sasikiran Kandula

Long H Ngo

Abstract

Background

Methods

Results

Conclusions

Background

Problem formulation

Previous and related work

Sample size determination

Learning curve fitting

Progressive sampling

Methods

Algorithm clarification

Learning curve creation

Model plumbing fixtures and parameter identification

Performance prediction

Evaluation

Datasets

Goodness of fit measures

Results

Table 1

Discussion

Conclusions

Competing interests

Authors' contributions

Supplementary Material

Acknowledgements

References

0 Response to "The Learning Curve Historical Review and Comprehensive Survey Filetypepdf"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel