Copying files and directories
This documents the expected behavior of the fsspec file and directory copying functions.
There are three functions of interest here: copy(),
get() and put().
Each of these copies files and/or directories from a source to a target location.
If we refer to our filesystem of interest, derived from AbstractFileSystem,
as the remote filesystem (even though it may be local) then the difference between the three
functions is:
The source and target are the first two arguments passed to these functions, and each
consists of one or more files, directories and/or glob (wildcard) patterns.
The behavior of the fsspec copy functions is intended to be the same as that obtained using
POSIX command line cp but fsspec functions have extra functionality because:
They support more than one
targetwhereas command linecpis restricted to one.They can create new directories, either automatically or via the
auto_mkdir=Truekeyword argument, whereas command linecponly does this as part of a recursive copy.
Expected behavior
There follows a comprehensive list of the expected behavior of the fsspec copying functions
that also forms the basis of a set of tests that all classes that derive from
AbstractFileSystem can be tested against to check that they conform.
For all scenarios the source filesystem contains the following directories and files:
📁 source
├── 📄 file1
├── 📄 file2
└── 📁 subdir
├── 📄 subfile1
├── 📄 subfile2
└── 📁 nesteddir
└── 📄 nestedfile
and before each scenario the target directory exists and is empty unless otherwise noted:
📁 target
All example code uses cp() which is an alias of
copy(); equivalent behavior is expected by
get() and put().
Forward slashes are used for directory separators throughout.
1. Single source to single target
1a. File to existing directory
cp("source/subdir/subfile1", "target/")
results in:
📁 target
└── 📄 subfile1
The trailing slash on "target/" is optional but recommended as it explicitly indicates that
the target is a directory.
1b. File to new directory
cp("source/subdir/subfile1", "target/newdir/")
results in:
📁 target
└── 📁 newdir
└── 📄 subfile1
This fails if the target file system is not capable of creating the directory, for example
if it is write-only or if auto_mkdir=False. There is no command line equivalent of this
scenario without an explicit mkdir to create the new directory.
The trailing slash is required on the new directory otherwise it is interpreted as a filename which is a different scenario (1d. File to file in new directory).
1c. File to file in existing directory
cp("source/subdir/subfile1", "target/newfile")
results in:
📁 target
└── 📄 newfile
The target cannot have a trailing slash as "newfile/" is interpreted as a new directory
which is a different scenario (1b. File to new directory).
1d. File to file in new directory
cp("source/subdir/subfile1", "target/newdir/newfile")
creates the new directory and copies the file into it:
📁 target
└── 📁 newdir
└── 📄 newfile
This fails if the target file system is not capable of creating the directory, for example
if it is write-only or if auto_mkdir=False. There is no command line equivalent of this
scenario without an explicit mkdir to create the new directory.
If there is a trailing slash on the target target/newdir/newfile/ then it is interpreted as
a new directory which is a different scenario (1b. File to new directory).
1e. Directory to existing directory
Warning
recursive=False is not correct.
maxdepth is not yet implemented for copying functions.
cp("source/subdir/", "target/", recursive=True)
results in:
📁 target
├── 📄 subfile1
└── 📄 subfile2
└── 📁 nesteddir
└── 📄 nestedfile
The recursive=True keyword argument is required otherwise the call does nothing. The depth
of recursion can be controlled using the maxdepth keyword argument, for example:
cp("source/subdir/", "target/", recursive=True, maxdepth=1)
results in:
📁 target
├── 📄 subfile1
└── 📄 subfile2
The trailing slash on "target/" is optional but recommended as it explicitly indicates that
the target is a directory.
If the trailing slash is omitted from "source/subdir" then the subdir is also copied,
not just its contents:
cp("source/subdir", "target/", recursive=True)
results in:
📁 target
└── 📁 subdir
├── 📄 subfile1
└── 📄 subfile2
└── 📁 nesteddir
└── 📄 nestedfile
1f. Directory to new directory
Warning
recursive=False is not correct.
maxdepth is not yet implemented for copying functions.
cp("source/subdir/", "target/newdir/", recursive=True)
results in:
📁 target
└── 📁 newdir
├── 📄 subfile1
└── 📄 subfile2
└── 📁 nesteddir
└── 📄 nestedfile
Trailing slashes on both source and target are optional and do not affect the result.
They are recommended to explicitly indicate both are directories.
The recursive=True keyword argument is required otherwise the call does nothing. The depth
of recursion can be controlled using the maxdepth keyword argument.
1g. Glob to existing directory
Warning
This does not currently work correctly.
Nonrecursive
cp("source/subdir/*", "target/")
copies files from the top-level directory only and results in:
📁 target
├── 📄 subfile1
└── 📄 subfile2
Recursive
cp("source/subdir/*", "target/", recursive=True)
results in:
📁 target
├── 📄 subfile1
└── 📄 subfile2
└── 📁 nesteddir
└── 📄 nestedfile
The depth of recursion can be controlled by the maxdepth keyword argument.
The trailing slash on "target/" is optional but recommended as it explicitly indicates that
the target is a directory.
1h. Glob to new directory
Warning
This does not currently work correctly.
Nonrecursive
cp("source/subdir/*", "target/newdir/")
copies files from the top-level directory only and results in:
📁 target
└── 📁 newdir
├── 📄 subfile1
└── 📄 subfile2
Recursive
cp("source/subdir/*", "target/newdir/", recursive=True)
results in:
📁 target
└── 📁 newdir
├── 📄 subfile1
└── 📄 subfile2
└── 📁 nesteddir
└── 📄 nestedfile
The depth of recursion can be controlled by the maxdepth keyword argument.
The trailing slash on the target is optional but recommended as it explicitly indicates that
it is a directory.
These calls fail if the target file system is not capable of creating the directory, for
example if it is write-only or if auto_mkdir=False. There is no command line equivalent of
this scenario without an explicit mkdir to create the new directory.
2. Multiple source to single target
2a. List of files to existing directory
cp(["source/file1", "source/file2", "source/subdir/subfile1"], "target/")
results in:
📁 target
├── 📄 file1
├── 📄 file2
└── 📄 subfile1
All of the files are copied to the target directory regardless of their relative paths in the
source filesystem. The trailing slash on the target is optional but recommended as it
explicitly indicates that it is a directory.
Warning
This is not correct currently.
2b. List of files to new directory
cp(["source/file1", "source/file2", "source/subdir/subfile1"], "target/newdir/")
results in:
📁 target
└── 📁 newdir
├── 📄 file1
├── 📄 file2
└── 📄 subfile1
All of the files are copied to the target directory regardless of their relative paths in the source filesystem.
The trailing slash is required on the new directory otherwise it is interpreted as a filename rather than a directory.
Warning
This is not correct currently.