public class TimePartitionedDataPublisher extends BaseDataPublisher
This is an updated copy of TimePartitionedDataPublisher
in gobblin-core module. This file should be deleted in favor of the upstream version when possible.
Updates are:
addWriterOutputToNewDir(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.Path, org.apache.gobblin.configuration.WorkUnitState, int, org.apache.gobblin.util.ParallelRunner)
to recursively iterate moving files from subfolders,
allowing to have detailed folders in states PUBLISH_DIRS -- https://github.com/apache/gobblin/pull/3409closer, lineageInfo, metadataMergers, metaDataWriterFileSystemByBranches, numBranches, parallelRunnerCloser, parallelRunners, parallelRunnerThreads, permissions, publisherFileSystemByBranches, publisherFinalDirOwnerGroupsByBranches, publisherOutputDirs, retrierConfig, shouldRetry, writerFileSystemByBranches
Constructor and Description |
---|
TimePartitionedDataPublisher(org.apache.gobblin.configuration.State state) |
Modifier and Type | Method and Description |
---|---|
protected void |
addWriterOutputToExistingDir(org.apache.hadoop.fs.Path writerOutput,
org.apache.hadoop.fs.Path publisherOutput,
org.apache.gobblin.configuration.WorkUnitState workUnitState,
int branchId,
org.apache.gobblin.util.ParallelRunner parallelRunner)
This method needs to be overridden for TimePartitionedDataPublisher, since the output folder structure
contains timestamp, we have to move the files recursively.
|
protected void |
addWriterOutputToNewDir(org.apache.hadoop.fs.Path writerOutput,
org.apache.hadoop.fs.Path publisherOutput,
org.apache.gobblin.configuration.WorkUnitState workUnitState,
int branchId,
org.apache.gobblin.util.ParallelRunner parallelRunner)
This method needs to be overridden for TimePartitionedDataPublisher, since the output folder structure
contains timestamp, we need to move the files recursively.
|
addSingleTaskWriterOutputToExistingDir, close, createDestinationDescriptor, getPublisherOutputDir, initialize, movePath, publishData, publishData, publishData, publishMetadata, publishMetadata, publishMultiTaskData, recordPublisherOutputDirs, shouldPublishMetadataFirst
getInstance, publish
public TimePartitionedDataPublisher(org.apache.gobblin.configuration.State state) throws IOException
IOException
protected void addWriterOutputToNewDir(org.apache.hadoop.fs.Path writerOutput, org.apache.hadoop.fs.Path publisherOutput, org.apache.gobblin.configuration.WorkUnitState workUnitState, int branchId, org.apache.gobblin.util.ParallelRunner parallelRunner) throws IOException
addWriterOutputToNewDir
in class BaseDataPublisher
IOException
protected void addWriterOutputToExistingDir(org.apache.hadoop.fs.Path writerOutput, org.apache.hadoop.fs.Path publisherOutput, org.apache.gobblin.configuration.WorkUnitState workUnitState, int branchId, org.apache.gobblin.util.ParallelRunner parallelRunner) throws IOException
addWriterOutputToExistingDir
in class BaseDataPublisher
IOException
Copyright © 2021. All rights reserved.