IT-Swarm.Net

快速阅读文本文件的最后一行?

从Java中的[非常非常大的]文件中读取最后一行文本的最快捷,最有效的方法是什么?

57
Jake

看看我的回答 C#的相似问题 。虽然Java中的编码支持有些不同,但代码非常相似。.

基本上,一般来说,这不是一件非常容易的事情。正如MSalter所指出的那样,UTF-8确实可以很容易地发现\r\n,因为这些字符的UTF-8表示与ASCII相同,并且这些字节不会出现在多字节字符中。.

所以基本上,取一个(比方说)2K的缓冲区,然后逐步向后读(在你之前跳到2K,读下一个2K)检查线路终止。然后跳到流中正确的位置,在顶部创建一个InputStreamReader,并在其上创建一个BufferedReader。然后只需调用BufferedReader.readLine()。.

18
Jon Skeet

快速,可靠且价格合理的云托管

注册并在30天内获得$50奖金!

下面是两个函数,一个返回文件的最后一个非空行而不加载或单步执行整个文件,另一个返回文件的最后N行而不单步执行整个文件:

尾巴的作用是直接缩放到文件的最后一个字符,然后逐个字符地逐个字符,记录它看到的内容,直到找到换行符。一旦找到换行符,它就会突破循环。反转记录的内容并将其抛入字符串并返回。 0xA是新行,0xD是回车符。.

如果您的行结尾是\r\ncrlf或其他一些“双换行样式换行符”,则您必须指定n * 2行才能获得最后n行,因为它为每行计算2行。.

public String tail( File file ) {
    RandomAccessFile fileHandler = null;
    try {
        fileHandler = new RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        StringBuilder sb = new StringBuilder();

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

            if( readByte == 0xA ) {
                if( filePointer == fileLength ) {
                    continue;
                }
                break;

            } else if( readByte == 0xD ) {
                if( filePointer == fileLength - 1 ) {
                    continue;
                }
                break;
            }

            sb.append( ( char ) readByte );
        }

        String lastLine = sb.reverse().toString();
        return lastLine;
    } catch( Java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( Java.io.IOException e ) {
        e.printStackTrace();
        return null;
    } finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
                /* ignore */
            }
    }
}

但你可能不想要最后一行,你想要最后N行,所以请改用:

public String tail2( File file, int lines) {
    Java.io.RandomAccessFile fileHandler = null;
    try {
        fileHandler = 
            new Java.io.RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        StringBuilder sb = new StringBuilder();
        int line = 0;

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

             if( readByte == 0xA ) {
                if (filePointer < fileLength) {
                    line = line + 1;
                }
            } else if( readByte == 0xD ) {
                if (filePointer < fileLength-1) {
                    line = line + 1;
                }
            }
            if (line >= lines) {
                break;
            }
            sb.append( ( char ) readByte );
        }

        String lastLine = sb.reverse().toString();
        return lastLine;
    } catch( Java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( Java.io.IOException e ) {
        e.printStackTrace();
        return null;
    }
    finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
            }
    }
}

像这样调用上面的方法:

File file = new File("D:\\stuff\\huge.log");
System.out.println(tail(file));
System.out.println(tail2(file, 10));

警告 在unicode的狂野西部,此代码可能导致此函数的输出错误。例如“Mary?s”而不是“Mary's”。带有 帽子,重音符号,中文字符 等的字符可能会导致输出错误,因为重音符号会在字符后添加为修饰符。反转复合字符会改变反转时字符身份的性质。您必须对计划使用此语言的所有语言进行全面的测试。.

有关此unicode反转问题的更多信息,请阅读: http://msmvps.com/blogs/jon_skeet/archive/2009/11/02/omg-ponies-aka-humanity-epic-fail.aspx

81
Eric Leschinski

Apache Commons有一个使用 RandomAccessFile 的实现。.

它被称为 ReversedLinesFileReader 。.

28
jaco0646

使用FileReader或FileInputStream不起作用 - 您必须使用 FileChannelRandomAccessFile 从末尾向后循环文件。但乔恩说,编码将成为一个问题。.

3
Michael Borgwardt

您可以轻松更改以下代码以打印最后一行。

用于打印最后5行的MemoryMappedFile:

private static void printByMemoryMappedFile(File file) throws FileNotFoundException, IOException{
        FileInputStream fileInputStream=new FileInputStream(file);
        FileChannel channel=fileInputStream.getChannel();
        ByteBuffer buffer=channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
        buffer.position((int)channel.size());
        int count=0;
        StringBuilder builder=new StringBuilder();
        for(long i=channel.size()-1;i>=0;i--){
            char c=(char)buffer.get((int)i);
            builder.append(c);
            if(c=='\n'){
                if(count==5)break;
                count++;
                builder.reverse();
                System.out.println(builder.toString());
                builder=null;
                builder=new StringBuilder();
            }
        }
        channel.close();
    }

RandomAccessFile打印最后5行:

private static void printByRandomAcessFile(File file) throws FileNotFoundException, IOException{
        RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
        int lines = 0;
        StringBuilder builder = new StringBuilder();
        long length = file.length();
        length--;
        randomAccessFile.seek(length);
        for(long seek = length; seek >= 0; --seek){
            randomAccessFile.seek(seek);
            char c = (char)randomAccessFile.read();
            builder.append(c);
            if(c == '\n'){
                builder = builder.reverse();
                System.out.println(builder.toString());
                lines++;
                builder = null;
                builder = new StringBuilder();
                if (lines == 5){
                    break;
                }
            }

        }
    }
1
Trying

据我所知,读取文本文件最后一行的最快方法是使用“org.Apache.commons.io”中的FileUtils Apache类。我有一个200万行的文件,通过使用这个类,我花了不到一秒的时间来找到最后一行。这是我的代码:

LineIterator lineIterator = FileUtils.lineIterator(newFile(filePath),"UTF-8");
String lastLine="";
while (lineIterator.hasNext()){
 lastLine=  lineIterator.nextLine();
}
0
arash nadali

C# 中,您应该能够设置流的位置:

来自: http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file

using(FileStream fs = File.OpenRead("c:\\file.dat"))
{
    using(StreamReader sr = new StreamReader(fs))
    {
        sr.BaseStream.Position = fs.Length - 4;
        if(sr.ReadToEnd() == "DONE")
            // match
    }
}
0
rball
try(BufferedReader reader = new BufferedReader(new FileReader(reqFile))) {

    String line = null;

    System.out.println("======================================");

    line = reader.readLine();       //Read Line ONE
    line = reader.readLine();       //Read Line TWO
    System.out.println("first line : " + line);

    //Length of one line if lines are of even length
    int len = line.length();       

    //skip to the end - 3 lines
    reader.skip((reqFile.length() - (len*3)));

    //Searched to the last line for the date I was looking for.

    while((line = reader.readLine()) != null){

        System.out.println("FROM LINE : " + line);
        String date = line.substring(0,line.indexOf(","));

        System.out.println("DATE : " + date);      //BAM!!!!!!!!!!!!!!
    }

    System.out.println(reqFile.getName() + " Read(" + reqFile.length()/(1000) + "KB)");
    System.out.println("======================================");
} catch (IOException x) {
    x.printStackTrace();
}
0
Ajai Singh