java - Multiple files as input on Amazon Elastic MapReduce -
i'm trying run job on elastic mapreduce (emr) custom jar. i'm trying process 1000 files in single directory. when submit job parameter s3n://bucketname/compressed/*.xml.gz
, "matched 0 files" error. if pass absolute path file (e.g. s3n://bucketname/compressed/00001.xml.gz
), runs fine, 1 file gets processed. tried using name of directory (s3n://bucketname/compressed/
), hoping files within processed, passes directory job.
at same time, have smaller local hadoop installation. in that, when pass job wildcards (/path/to/dir/on/hdfs/*.xml.gz
), works fine , 1000 files listed correctly.
how emr list files?
i don't know how emr lists files, here's piece of code works me:
filesystem fs = filesystem.get(uri.create(args[0]), job.getconfiguration()); filestatus[] files = fs.liststatus(new path(args[0])); for(filestatus sfs:files){ fileinputformat.addinputpath(job, sfs.getpath()); }
it list files in input directory, , can will
Comments
Post a Comment