segunda-feira, 14 de maio de 2012

[EN] - Big Data Helper - Part 3 - Loading Data (files)

Ok, so you have the environment ready, and you have a basic understanding of some of the components of Hadoop. Now, what can you do, and how?
Well, to get started we need DATA. So what can we do?


Source data can have many forms. We will discuss two sources: files and databases.
On this post we will discuss files.


Steps:
  1. [S]FTP some file(s) to your environment's filesystem (with filezilla for example);
  2. Load you data to the HDFS filesystem
    1. $ hadoop fs -copyFromLocal sap.csv .
  3. Voilá. All done.


The command at 2.1, copies a file named sap.csv to the user's root folder on the HDFS filesystem.
you can get more options with
$ hadoop fs -help

You are ready to do some more interesting stuff. The final purpose of Hadoop: running MapReduce jobs (at least this one of the ways of loading).

Thank you.




-- ====================

Other Tutorial Links




http://pinelasgarden.blogspot.pt/2012/04/en-big-data-helper-part-1-concepts.html
http://pinelasgarden.blogspot.pt/2012/04/en-big-data-helper-part-2-getting.html
http://pinelasgarden.blogspot.pt/2012/05/en-big-data-helper-part-4-pig.html
http://pinelasgarden.blogspot.pt/2012/05/en-big-data-helper-part-5-mapreduce.html

Sem comentários:

Enviar um comentário