worldwide

woanders ist es auch schoen


2014.07
datasci-1903 datasci-1903 - It's actually quite easy to develop and test the pig-scripts locally, see Ge Peng's instructions in this thread: https://class.coursera.org/datasci-002/forum/thread?thread_id=1754#post-8530 (Pig obviously includes Hadoop and can be run locally out of the box; i.e. there is no need for a VM if Java is installed)
# hadoop

cd mkdir pig cd pig curl http://download.nextag.com/apache/pig/pig-0.13.0/pig-0.13.0.tar.gz -o pig-0.13.0.tar.gz tar xzvf pig-0.13.0.tar.gz cd ~/intrDataScience/datasci_course_materials/assignment4 ~/pig/pig-0.13.0/bin/pig -x local grunt> register ./pigtest/myudfs.jar grunt> raw = load './cse344-test-file' USING TextLoader as (line:chararray); grunt> ntriples = foreach raw generate FLATTEN(myudfs.RDFSplit3(line)) as (subject:chararray,predicate:chararray,object:chararray); grunt> describe ntriples ntriples: {subject: chararray,predicate: chararray,object: chararray} grunt> line1 = limit ntriples 1; grunt> dump line1