g0601_0700.s0609_find_duplicate_file_in_system.Solution Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of leetcode-in-java21 Show documentation
Show all versions of leetcode-in-java21 Show documentation
Java-based LeetCode algorithm problem solutions, regularly updated
package g0601_0700.s0609_find_duplicate_file_in_system;
// #Medium #Array #String #Hash_Table #2022_03_21_Time_20_ms_(97.68%)_Space_51.3_MB_(87.10%)
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* 609 - Find Duplicate File in System\.
*
* Medium
*
* Given a list `paths` of directory info, including the directory path, and all the files with contents in this directory, return _all the duplicate files in the file system in terms of their paths_. You may return the answer in **any order**.
*
* A group of duplicate files consists of at least two files that have the same content.
*
* A single directory info string in the input list has the following format:
*
* * `"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"`
*
* It means there are `n` files `(f1.txt, f2.txt ... fn.txt)` with content `(f1_content, f2_content ... fn_content)` respectively in the directory "`root/d1/d2/.../dm"`. Note that `n >= 1` and `m >= 0`. If `m = 0`, it means the directory is just the root directory.
*
* The output is a list of groups of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:
*
* * `"directory_path/file_name.txt"`
*
* **Example 1:**
*
* **Input:** paths = ["root/a 1.txt(abcd) 2.txt(efgh)","root/c 3.txt(abcd)","root/c/d 4.txt(efgh)","root 4.txt(efgh)"]
*
* **Output:** [["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]
*
* **Example 2:**
*
* **Input:** paths = ["root/a 1.txt(abcd) 2.txt(efgh)","root/c 3.txt(abcd)","root/c/d 4.txt(efgh)"]
*
* **Output:** [["root/a/2.txt","root/c/d/4.txt"],["root/a/1.txt","root/c/3.txt"]]
*
* **Constraints:**
*
* * 1 <= paths.length <= 2 * 104
* * `1 <= paths[i].length <= 3000`
* * 1 <= sum(paths[i].length) <= 5 * 105
* * `paths[i]` consist of English letters, digits, `'/'`, `'.'`, `'('`, `')'`, and `' '`.
* * You may assume no files or directories share the same name in the same directory.
* * You may assume each given directory info represents a unique directory. A single blank space separates the directory path and file info.
*
* **Follow up:**
*
* * Imagine you are given a real file system, how will you search files? DFS or BFS?
* * If the file content is very large (GB level), how will you modify your solution?
* * If you can only read the file by 1kb each time, how will you modify your solution?
* * What is the time complexity of your modified solution? What is the most time-consuming part and memory-consuming part of it? How to optimize?
* * How to make sure the duplicated files you find are not false positive?
**/
public class Solution {
public List> findDuplicate(String[] paths) {
Map> map = new HashMap<>();
for (String path : paths) {
String[] pathComponents = path.split(" ");
String root = pathComponents[0];
for (int i = 1; i < pathComponents.length; i++) {
int startIndex = pathComponents[i].indexOf("(");
int endIndex = pathComponents[i].lastIndexOf(")");
String content = pathComponents[i].substring(startIndex, endIndex);
map.putIfAbsent(content, new ArrayList<>());
map.get(content).add(root + "/" + pathComponents[i].substring(0, startIndex));
}
}
List> result = new ArrayList<>();
for (List list : map.values()) {
if (list.size() > 1) {
result.add(list);
}
}
return result;
}
}
© 2015 - 2025 Weber Informatics LLC | Privacy Policy