R: Data: reading data from the internet

From MathWiki

Download to a .csv file

When this is possible it is probably the easiest but see typical data conversion problems below.

Download as text

An easy way to download a file as text is to cut and paste the data into a file of source code, e.g. with Tinn-R, and then 'cat' the data into a temporary file. For example, using the Iowa Election Market data at http://iemweb.biz.uiowa.edu/pricehistory/PriceHistory_GetData.cfm, we can cut and paste the data as follows:

cat("
Date  	    Contract  	    Units  	    Volume  	    LowPrice  	    HighPrice  	    AvgPrice  	    LastPrice
06/01/08  	  DEM08_WTA  	 230  	 140.164  	 0.601  	 0.617  	 0.609  	 0.617
06/01/08 	    REP08_WTA 	108 	42.932 	0.397 	0.400 	0.398 	0.397
06/02/08 	    DEM08_WTA 	139 	83.820 	0.600 	0.609 	0.603 	0.604
06/02/08 	    REP08_WTA 	380 	156.543 	0.403 	0.423 	0.412 	0.414
06/03/08 	    DEM08_WTA 	318 	195.291 	0.604 	0.625 	0.614 	0.619
06/03/08 	    REP08_WTA 	338 	131.976 	0.380 	0.399 	0.390 	0.399
06/04/08 	    DEM08_WTA 	1,011 	625.431 	0.601 	0.624 	0.619 	0.622
06/04/08 	    REP08_WTA 	442 	172.914 	0.380 	0.409 	0.391 	0.399
06/05/08 	    DEM08_WTA 	266 	164.409 	0.613 	0.622 	0.618 	0.613
06/05/08 	    REP08_WTA 	173 	67.547 	0.383 	0.399 	0.390 	0.399
06/06/08 	    DEM08_WTA 	142 	86.333 	0.607 	0.614 	0.608 	0.607
06/06/08 	    REP08_WTA 	58 	22.620 	0.390 	0.390 	0.390 	0.390
06/07/08 	    DEM08_WTA 	1,073 	696.182 	0.600 	0.750 	0.649 	0.748
06/07/08 	    REP08_WTA 	253 	101.907 	0.395 	0.413 	0.403 	0.395
06/08/08 	    DEM08_WTA 	715 	474.275 	0.611 	0.710 	0.663 	0.700
06/08/08 	    REP08_WTA 	386 	148.504 	0.372 	0.395 	0.385 	0.377
06/09/08 	    DEM08_WTA 	1,174 	752.990 	0.600 	0.700 	0.641 	0.600
06/09/08 	    REP08_WTA 	1,586 	631.971 	0.365 	0.434 	0.398 	0.401
06/10/08 	    DEM08_WTA 	992 	606.554 	0.601 	0.650 	0.611 	0.610
06/10/08 	    REP08_WTA 	1,289 	504.324 	0.390 	0.400 	0.391 	0.398
06/11/08 	    DEM08_WTA 	465 	280.466 	0.600 	0.615 	0.603 	0.600
06/11/08 	    REP08_WTA 	782 	312.855 	0.397 	0.403 	0.400 	0.403
06/12/08 	    DEM08_WTA 	1,215 	763.819 	0.600 	0.700 	0.629 	0.700
06/12/08 	    REP08_WTA 	1,056 	420.300 	0.390 	0.401 	0.398 	0.390

# in the original web file the following line had blank instead of 'NA'. This caused an
# error because 'read.csv' saw a different number of fields for that line.
# The NA was added manually.  (Note that blank lines are skipped and that comments preceded by
# the usual '#' can be included in a text file such as this one.

06/13/08 	    DEM08_WTA 	0 	0.000 	0.000 	0.000   NA  0.700
06/13/08 	    REP08_WTA 	321 	125.200 	0.390 	0.391 	0.390 	0.390
06/14/08 	    DEM08_WTA 	163 	99.579 	0.610 	0.625 	0.611 	0.610
06/14/08 	    REP08_WTA 	288 	112.324 	0.390 	0.391 	0.390 	0.391
06/15/08 	    DEM08_WTA 	48 	29.175 	0.603 	0.610 	0.608 	0.603
06/15/08 	    REP08_WTA 	209 	84.305 	0.391 	0.414 	0.403 	0.410
06/16/08 	    DEM08_WTA 	113 	68.172 	0.600 	0.611 	0.603 	0.609
06/16/08 	    REP08_WTA 	250 	100.117 	0.400 	0.408 	0.400 	0.408
06/17/08 	    DEM08_WTA 	55 	33.389 	0.604 	0.610 	0.607 	0.604
06/17/08 	    REP08_WTA 	32 	12.861 	0.396 	0.403 	0.402 	0.403
06/18/08 	    DEM08_WTA 	74 	44.403 	0.590 	0.604 	0.600 	0.601
06/18/08 	    REP08_WTA 	180 	72.760 	0.395 	0.408 	0.404 	0.395
06/19/08 	    DEM08_WTA 	157 	94.767 	0.601 	0.614 	0.604 	0.607
06/19/08 	    REP08_WTA 	17 	6.718 	0.395 	0.397 	0.395 	0.395
06/20/08 	    DEM08_WTA 	18 	10.980 	0.610 	0.610 	0.610 	0.610
06/20/08 	    REP08_WTA 	15 	5.925 	0.395 	0.395 	0.395 	0.395
06/21/08 	    DEM08_WTA 	151 	92.194 	0.610 	0.611 	0.611 	0.611
06/21/08 	    REP08_WTA 	171 	66.772 	0.390 	0.394 	0.390 	0.390
06/22/08 	    DEM08_WTA 	26 	15.989 	0.614 	0.615 	0.615 	0.615
06/22/08 	    REP08_WTA 	16 	6.145 	0.384 	0.385 	0.384 	0.384
06/23/08 	    DEM08_WTA 	937 	596.520 	0.615 	0.642 	0.637 	0.620
06/23/08 	    REP08_WTA 	1,419 	552.470 	0.375 	0.400 	0.389 	0.381
06/24/08 	    DEM08_WTA 	261 	161.625 	0.617 	0.635 	0.619 	0.635
06/24/08 	    REP08_WTA 	307 	117.820 	0.381 	0.389 	0.384 	0.381
06/25/08 	    DEM08_WTA 	609 	384.129 	0.621 	0.649 	0.631 	0.622
06/25/08 	    REP08_WTA 	302 	113.712 	0.371 	0.389 	0.377 	0.371
06/26/08 	    DEM08_WTA 	840 	544.480 	0.642 	0.649 	0.648 	0.649
06/26/08 	    REP08_WTA 	794 	296.362 	0.373 	0.378 	0.373 	0.373
06/27/08 	    DEM08_WTA 	17 	11.016 	0.648 	0.648 	0.648 	0.648
06/27/08 	    REP08_WTA 	105 	39.306 	0.373 	0.375 	0.374 	0.375
06/28/08 	    DEM08_WTA 	75 	47.900 	0.630 	0.643 	0.639 	0.630
06/28/08 	    REP08_WTA 	67 	25.071 	0.373 	0.375 	0.374 	0.375
06/29/08 	    DEM08_WTA 	805 	517.378 	0.637 	0.653 	0.643 	0.645
06/29/08 	    REP08_WTA 	1,110 	408.838 	0.355 	0.370 	0.368 	0.355
06/30/08 	    DEM08_WTA 	172 	111.611 	0.648 	0.650 	0.649 	0.649
06/30/08 	    REP08_WTA 	195 	69.574 	0.350 	0.369 	0.357 	0.364
07/01/08  	    DEM08_WTA  	 482  	 318.530  	 0.649  	 0.670  	 0.661  	 0.662
07/01/08 	    REP08_WTA 	344 	119.320 	0.326 	0.358 	0.347 	0.358
07/02/08 	    DEM08_WTA 	125 	83.504 	0.662 	0.672 	0.668 	0.670
07/02/08 	    REP08_WTA 	25 	8.326 	0.328 	0.345 	0.333 	0.328
07/03/08 	    DEM08_WTA 	1,202 	823.907 	0.669 	0.722 	0.685 	0.722
07/03/08 	    REP08_WTA 	1,583 	520.873 	0.326 	0.332 	0.329 	0.326
07/04/08 	    DEM08_WTA 	327 	224.278 	0.672 	0.694 	0.686 	0.694
07/04/08 	    REP08_WTA 	392 	127.241 	0.324 	0.333 	0.325 	0.333
07/05/08 	    DEM08_WTA 	28 	18.742 	0.669 	0.670 	0.669 	0.669
07/05/08 	    REP08_WTA 	292 	95.609 	0.324 	0.334 	0.327 	0.334
07/06/08 	    DEM08_WTA 	181 	124.518 	0.687 	0.689 	0.688 	0.689
07/06/08 	    REP08_WTA 	66 	21.804 	0.330 	0.334 	0.330 	0.330
07/07/08 	    DEM08_WTA 	75 	51.128 	0.665 	0.688 	0.682 	0.680
07/07/08 	    REP08_WTA 	1,244 	416.917 	0.330 	0.345 	0.335 	0.344
07/08/08 	    DEM08_WTA 	668 	441.174 	0.635 	0.679 	0.660 	0.676
07/08/08 	    REP08_WTA 	931 	324.510 	0.338 	0.365 	0.349 	0.360
07/09/08 	    DEM08_WTA 	123 	78.347 	0.631 	0.643 	0.637 	0.640
07/09/08 	    REP08_WTA 	71 	25.958 	0.361 	0.369 	0.366 	0.369
07/10/08 	    DEM08_WTA 	78 	50.157 	0.640 	0.652 	0.643 	0.640
07/10/08 	    REP08_WTA 	261 	95.290 	0.359 	0.368 	0.365 	0.360
07/11/08 	    DEM08_WTA 	86 	55.282 	0.640 	0.652 	0.643 	0.642
07/11/08 	    REP08_WTA 	483 	172.729 	0.350 	0.361 	0.358 	0.360
07/12/08 	    DEM08_WTA 	182 	116.496 	0.640 	0.643 	0.640 	0.640
07/12/08 	    REP08_WTA 	276 	99.803 	0.359 	0.363 	0.362 	0.363
07/13/08 	    DEM08_WTA 	70 	44.959 	0.642 	0.643 	0.642 	0.643
07/13/08 	    REP08_WTA 	211 	78.425 	0.358 	0.379 	0.372 	0.358
07/14/08 	    DEM08_WTA 	356 	230.850 	0.640 	0.662 	0.648 	0.662
07/14/08 	    REP08_WTA 	217 	79.064 	0.354 	0.369 	0.364 	0.369
07/15/08 	    DEM08_WTA 	685 	467.136 	0.649 	0.690 	0.682 	0.649
07/15/08 	    REP08_WTA 	432 	151.445 	0.350 	0.355 	0.351 	0.351
07/16/08 	    DEM08_WTA 	88 	57.585 	0.646 	0.665 	0.654 	0.646
07/16/08 	    REP08_WTA 	637 	225.372 	0.350 	0.358 	0.354 	0.354
07/17/08 	    DEM08_WTA 	848 	552.857 	0.645 	0.668 	0.652 	0.649
07/17/08 	    REP08_WTA 	827 	291.809 	0.352 	0.355 	0.353 	0.355
07/18/08 	    DEM08_WTA 	118 	76.646 	0.646 	0.652 	0.650 	0.646
07/18/08 	    REP08_WTA 	216 	76.090 	0.352 	0.356 	0.352 	0.355
07/19/08 	    DEM08_WTA 	281 	183.192 	0.651 	0.658 	0.652 	0.651
07/19/08 	    REP08_WTA 	289 	101.590 	0.351 	0.355 	0.352 	0.351
07/20/08 	    DEM08_WTA 	135 	88.824 	0.655 	0.660 	0.658 	0.660
07/20/08 	    REP08_WTA 	49 	17.199 	0.351 	0.351 	0.351 	0.351
07/21/08 	    DEM08_WTA 	149 	97.462 	0.642 	0.658 	0.654 	0.656
07/21/08 	    REP08_WTA 	151 	53.001 	0.351 	0.351 	0.351 	0.351
07/22/08 	    DEM08_WTA 	60 	38.866 	0.640 	0.656 	0.648 	0.640
07/22/08 	    REP08_WTA 	136 	48.677 	0.351 	0.364 	0.358 	0.360
07/23/08 	    DEM08_WTA 	649 	415.001 	0.635 	0.648 	0.639 	0.636
07/23/08 	    REP08_WTA 	1,008 	367.336 	0.355 	0.365 	0.364 	0.365
07/24/08 	    DEM08_WTA 	421 	272.560 	0.635 	0.660 	0.647 	0.635
07/24/08 	    REP08_WTA 	296 	108.137 	0.365 	0.374 	0.365 	0.370
07/25/08 	    DEM08_WTA 	187 	120.817 	0.637 	0.650 	0.646 	0.648
07/25/08 	    REP08_WTA 	81 	29.458 	0.355 	0.370 	0.364 	0.355
07/26/08 	    DEM08_WTA 	844 	568.284 	0.649 	0.699 	0.673 	0.688
07/26/08 	    REP08_WTA 	280 	99.055 	0.352 	0.355 	0.354 	0.355
07/27/08 	    DEM08_WTA 	81 	52.401 	0.643 	0.653 	0.647 	0.643
07/27/08 	    REP08_WTA 	351 	124.258 	0.352 	0.358 	0.354 	0.358
07/28/08 	    DEM08_WTA 	367 	235.546 	0.620 	0.650 	0.642 	0.629
07/28/08 	    REP08_WTA 	430 	158.249 	0.358 	0.381 	0.368 	0.381
07/29/08 	    DEM08_WTA 	39 	24.475 	0.625 	0.630 	0.628 	0.626
07/29/08 	    REP08_WTA 	26 	9.674 	0.370 	0.375 	0.372 	0.375
07/30/08 	    DEM08_WTA 	154 	96.837 	0.623 	0.645 	0.629 	0.623
07/30/08 	    REP08_WTA 	199 	75.854 	0.364 	0.386 	0.381 	0.364
07/31/08 	    DEM08_WTA 	107 	67.851 	0.625 	0.640 	0.634 	0.627
07/31/08 	    REP08_WTA 	305 	114.180 	0.365 	0.380 	0.374 	0.373
08/01/08 	    DEM08_WTA 	608 	383.566 	0.626 	0.640 	0.631 	0.627
08/01/08 	    REP08_WTA 	237 	89.420 	0.374 	0.381 	0.377 	0.374
08/02/08 	    DEM08_WTA 	520 	325.994 	0.625 	0.633 	0.627 	0.625
08/02/08 	    REP08_WTA 	706 	265.181 	0.374 	0.379 	0.376 	0.375
08/03/08 	    DEM08_WTA 	585 	365.597 	0.620 	0.634 	0.625 	0.634
08/03/08 	    REP08_WTA 	391 	147.013 	0.375 	0.376 	0.376 	0.376
08/04/08 	    DEM08_WTA 	313 	196.660 	0.625 	0.634 	0.628 	0.634
08/04/08 	    REP08_WTA 	228 	86.376 	0.376 	0.381 	0.379 	0.379
08/05/08 	    DEM08_WTA 	392 	244.326 	0.611 	0.633 	0.623 	0.615
08/05/08 	    REP08_WTA 	215 	81.667 	0.376 	0.384 	0.380 	0.384
08/06/08 	    DEM08_WTA 	271 	167.633 	0.610 	0.628 	0.619 	0.625
08/06/08 	    REP08_WTA 	1,125 	427.531 	0.376 	0.386 	0.380 	0.383
08/07/08 	    DEM08_WTA 	443 	277.467 	0.623 	0.635 	0.626 	0.635
08/07/08 	    REP08_WTA 	843 	317.072 	0.372 	0.384 	0.376 	0.372
08/08/08 	    DEM08_WTA 	238 	150.356 	0.628 	0.634 	0.632 	0.628
08/08/08 	    REP08_WTA 	282 	105.291 	0.373 	0.376 	0.373 	0.373
08/09/08 	    DEM08_WTA 	102 	63.954 	0.627 	0.627 	0.627 	0.627
08/09/08 	    REP08_WTA 	403 	151.519 	0.373 	0.377 	0.376 	0.377
08/10/08 	    DEM08_WTA 	11 	6.836 	0.621 	0.622 	0.621 	0.621
08/10/08 	    REP08_WTA 	0 	0.000 	0.000 	0.000    NA    0.377
08/11/08 	    DEM08_WTA 	39 	24.344 	0.621 	0.626 	0.624 	0.626
08/11/08 	    REP08_WTA 	256 	96.395 	0.373 	0.377 	0.377 	0.373
08/12/08 	    DEM08_WTA 	0 	0.000 	0.000 	0.000 		NA  0.626
08/12/08 	    REP08_WTA 	0 	0.000 	0.000 	0.000 		NA  0.373
08/13/08 	    DEM08_WTA 	105 	66.298 	0.623 	0.636 	0.631 	0.636
08/13/08 	    REP08_WTA 	1,187 	446.283 	0.372 	0.380 	0.376 	0.375
08/14/08 	    DEM08_WTA 	348 	220.059 	0.624 	0.650 	0.632 	0.625
08/14/08 	    REP08_WTA 	559 	211.157 	0.373 	0.381 	0.378 	0.379
08/15/08 	    DEM08_WTA 	0 	0.000 	0.000 	0.000   NA    0.625
08/15/08 	    REP08_WTA 	29 	10.846 	0.374 	0.374 	0.374 	0.374
08/16/08 	    DEM08_WTA 	53 	33.072 	0.624 	0.624 	0.624 	0.624
08/16/08 	    REP08_WTA 	243 	91.965 	0.376 	0.381 	0.378 	0.376
08/17/08 	    DEM08_WTA 	45 	28.239 	0.627 	0.628 	0.628 	0.628
08/17/08 	    REP08_WTA 	138 	52.385 	0.378 	0.380 	0.380 	0.380
08/18/08 	    DEM08_WTA 	106 	65.553 	0.612 	0.620 	0.618 	0.617
08/18/08 	    REP08_WTA 	537 	206.684 	0.382 	0.388 	0.385 	0.385
08/19/08 	    DEM08_WTA 	1,073 	659.487 	0.600 	0.619 	0.615 	0.619
08/19/08 	    REP08_WTA 	1,079 	419.657 	0.382 	0.394 	0.389 	0.383
08/20/08 	    DEM08_WTA 	1,122 	682.651 	0.595 	0.620 	0.608 	0.596
08/20/08 	    REP08_WTA 	1,399 	547.950 	0.375 	0.396 	0.392 	0.396
08/21/08 	    DEM08_WTA 	514 	316.649 	0.603 	0.624 	0.616 	0.607
08/21/08 	    REP08_WTA 	899 	356.660 	0.394 	0.398 	0.397 	0.394
08/22/08 	    DEM08_WTA 	3 	1.817 	0.603 	0.610 	0.606 	0.603
08/22/08 	    REP08_WTA 	329 	129.680 	0.385 	0.399 	0.394 	0.385
08/23/08 	    DEM08_WTA 	741 	454.898 	0.598 	0.620 	0.614 	0.618
08/23/08 	    REP08_WTA 	691 	277.080 	0.397 	0.410 	0.401 	0.402
08/24/08 	    DEM08_WTA 	38 	22.800 	0.600 	0.600 	0.600 	0.600
08/24/08 	    REP08_WTA 	235 	94.098 	0.400 	0.402 	0.400 	0.400
08/25/08 	    DEM08_WTA 	874 	540.660 	0.600 	0.620 	0.619 	0.605
08/25/08 	    REP08_WTA 	1,075 	422.889 	0.385 	0.410 	0.393 	0.403
08/26/08 	    DEM08_WTA 	714 	405.140 	0.540 	0.599 	0.567 	0.555
08/26/08 	    REP08_WTA 	3,703 	1,662.557 	0.403 	0.499 	0.449 	0.445
08/27/08 	    DEM08_WTA 	447 	259.215 	0.579 	0.594 	0.580 	0.594
08/27/08 	    REP08_WTA 	550 	241.967 	0.425 	0.450 	0.440 	0.425
08/28/08 	    DEM08_WTA 	2,980 	1,777.148 	0.570 	0.620 	0.596 	0.620
08/28/08 	    REP08_WTA 	2,531 	1,073.118 	0.406 	0.439 	0.424 	0.410
08/29/08 	    DEM08_WTA 	3,148 	1,903.061 	0.577 	0.634 	0.605 	0.606
08/29/08 	    REP08_WTA 	5,014 	2,062.118 	0.400 	0.429 	0.411 	0.402
08/30/08 	    DEM08_WTA 	994 	599.041 	0.593 	0.620 	0.603 	0.607
08/30/08 	    REP08_WTA 	821 	328.110 	0.392 	0.403 	0.400 	0.394
08/31/08	    DEM08_WTA	 389	 234.188	 0.596	 0.607	 0.602	 0.606
08/31/08	    REP08_WTA	 225	 89.543	 0.392	 0.399	 0.398	 0.398
09/01/08	    DEM08_WTA	 424	 256.645	 0.600	 0.610	 0.605	 0.602
09/01/08	    REP08_WTA	 2,701	 1,086.106	 0.392	 0.425	 0.402	 0.395
09/02/08	    DEM08_WTA	 2,534	 1,543.558	 0.601	 0.615	 0.609	 0.606
09/02/08	    REP08_WTA	 4,257	 1,680.234	 0.390	 0.407	 0.395	 0.396

", file = "ziowa.txt")   # save date in temporary file

# identify rows that have the wrong number of fields and fix if necessary

which( print(count.fields( 'ziowa.txt')) != 8) 

ziowa <- read.table( 'ziowa.txt', header = T)  # read file into a data frame
head(ziowa)

# remove duplicated rows in case you cut and pasted the same lines more than once

ziowa <- ziowa[ !duplicated(ziowa),]  

Basic data conversion

Variables read from raw text files often need conversion before they can be used for analysis. For example in the example above, we would like 'Date' to be a "Date" variable and we note that some numeric variables are considered factors because 'read.csv' interprets numbers with commas as separators as character data. It is easy to fix these. First have a look at the data and the class of each variable: (xqplot in 'fun.R' does this easily).

> head(ziowa)
      Date  Contract Units  Volume LowPrice HighPrice AvgPrice LastPrice
1 06/01/08 DEM08_WTA   230 140.164    0.601     0.617    0.609     0.617
2 06/01/08 REP08_WTA   108  42.932    0.397     0.400    0.398     0.397
3 06/02/08 DEM08_WTA   139  83.820    0.600     0.609    0.603     0.604
4 06/02/08 REP08_WTA   380 156.543    0.403     0.423    0.412     0.414
5 06/03/08 DEM08_WTA   318 195.291    0.604     0.625    0.614     0.619
6 06/03/08 REP08_WTA   338 131.976    0.380     0.399    0.390     0.399
> ziowa = ziowa[ !duplicated(ziowa),]
> 
> sapply( ziowa, class)
     Date  Contract     Units    Volume  LowPrice HighPrice  AvgPrice LastPrice 
 "factor"  "factor"  "factor"  "factor" "numeric" "numeric" "numeric" "numeric" 

We note that 'Units' and 'Volume' should be numeric and 'Date' should be a Date object. The conversions are easy:

ziowa$Date     <- as.Date( ziowa$Date, "%m/%d/%y")    # see ?as.Date
ziowa$Volume   <- as.numeric( sub(",","", ziowa$Volume))   # remove commas from from numbers
ziowa$Units    <- as.numeric( sub(",","", ziowa$Units))

Now it is easy to have a look at the volatility of predictions for the Democratic presidential candidate.

> library(lattice)
> xyplot( LowPrice + AvgPrice + HighPrice ~ Date, ziowa,
+     subset = Contract == "DEM08_WTA" & LowPrice > .01,
+     type = 'b', lwd = c( .5,2,.5), col = c('black','red','black'))
> 
> 

Image:IowaMarkets.png