-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Hi, I wonder if you could please help as I'm encountering issues with this gem in a live project and I'm at a bit of a loss as to what is happening so if anyone can provide any help it would be greately appreciated!
Overview
The FTP integration is used by a number of old weather stations that connect via FTP, over a mobile network, and write 1 line of text every minute. I'm simply using an override for the write_file() function that looks at the filename and then passed the line to be written on to be written to the database. This was working just fine for months until recently when I started to see 'Error Connection reset by peer @ io_fillbuf - fd:{number}' errors. Once these start to appear in the logs, slowly all of the connections start to generate this error until the FTP server refuses to accept connections and the only way that seems to fix it so far is to restart the application, where it will run for a apparently random amount of time from 7 hours to over a day before this starts to happen again, requiring another restart.
System
- Debian docker instance Debian GNU/Linux 9.13 (stretch)
- ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
- ftpd-2.1.0
What's unusual and did anything change?
- More weather stations were added but the total is only 28.
- These stations are using mobile network and are kind of slow. Mostly the session is < 1 second from connect to write and close but this can be up to 9 seconds from what I've seen.
- Apart from the weather stations connecting in, there's one connection from someone who is connecting in every 5 minutes to read the files for their own purposes, which is some legacy task from a 3rd part system.
- There is a miss-configured weather station that tries to connect every minute with the wrong credentials.
- There does not appear to be a consistent time frame between restart and the first instance of the error in the logs. Complete lockout (all connections refused) is almost daily though.
What are the settings?
It may be worth mentioning that this is running in a docker environment with mapped ports
The file system uses the default.
Initializers
require 'driver'
# Need to overwrite the write function so we can grab the data
Ftpd::DiskFileSystem.class_eval do
def write_file(ftp_path, stream, mode)
begin
File.open(expand_ftp_path(ftp_path), mode) do |file|
# Get the details from the path
arr_acc_dh = file.path.gsub(FTP_ROOT, "").split("/")
account_id = arr_acc_dh[1]
datahog_id = arr_acc_dh[2]
dh_importer = DataImporter.new()
while line = stream.read
# Ignore anything other than day data
if ftp_path.include? "daydata.txt"
dh_importer.import_data(line, account_id, datahog_id)
elsif ftp_path.include? "rssidata.txt"
dh_importer.import_ssri_data(line, account_id, datahog_id)
end
file.write line
end
end
rescue Exception => e
# Something bad happened
puts "ERROR : #{DateTime.now} : #{Thread.current.object_id} : write_file: #{e.full_message} \n"
end
end
end
# Override to set host and port
Ftpd::Session.class_eval do
def initialize_session
@logged_in = false
@data_type = 'A'
@mode = 'S'
@structure = 'F'
@data_channel_protection_level = :clear
@name_prefix = '/'
# @data_hostname = nil
# @data_port = nil
@data_hostname = '{hostname}'
@data_port = 2157
@protection_buffer_size_set = false
@epsv_all = false
close_data_server_socket
reset_failed_auths
end
end
# Strange error on stat request on live server. Debug attempt
Ftpd::CmdStat.class_eval do
def cmd_stat(argument)
reply "220 sd/ftpd 2.1.0"
end
end
Main execution block
# Set the log output and override to include thread info
skye_logger = Logger.new($stdout) #Logger.new('/var/log/ftpd.log')
skye_logger.formatter = proc do |severity, datetime, progname, msg|
"#{severity} : #{datetime} : #{Thread.current.object_id} : #{msg}\n"
end
driver = Driver.new("/var/www/shared/ftp_storage" , skye_logger) # Pass logger to driver
server = Ftpd::FtpServer.new(driver)
server.log = skye_logger
server.port = 2157
server.interface = '0.0.0.0'
server.nat_ip = '{External Server IP}'
server.passive_ports = Range.new(4021, 4060)
server.allow_low_data_ports = false
server.session_timeout = 45
server.failed_login_delay = 120
server.max_connections = 100
server.max_connections_per_ip = 1
server.tls = :off
server.start
puts "FTP Server listening on port #{server.bound_port}"
puts "FTP Server interface #{server.interface}"
begin
gets
rescue Interrupt
puts "Interrupt"
end`
Driver Code
require 'ftpd'
require "tmpdir"
class Driver
def initialize(ftp_root, logger)
@logger = logger
@ftp_root = ftp_root
@full_datahog_data_path = ''
@logger.debug "passed ftp_root #{ftp_root}"
end
def authenticate(user, password)
begin
# Expect the username to be consisting of 2 numbers account_id and hog_id.
account_id = ''
datahog_id = ''
if user.split("_").count != 2
raise "invalid login. Must consist of account_id and datahog_id e.g. '123_456'"
end
account_id = user.split("_")[0].to_i
datahog_id = user.split("_")[1].to_i
datahog = Datahog.includes(:account).references(:account).where(id: datahog_id).where({account: { id: account_id}}).where({datahogs: { ftp_password: password}}).where(account: {active: true})
if datahog.blank?
@logger.info "Invalid FTP credentials or inactive account. No matching datahog found for user: #{user} and password: #{password}"
false
else
# Create the account directory if it doesn't exist already
ftp_root_with_account = "#{@ftp_root}/#{account_id}"
unless File.exist?(ftp_root_with_account)
FileUtils.mkdir_p(ftp_root_with_account)
end
# Now the datahog directory
self.instance_variable_set(:@full_datahog_path, "#{ftp_root_with_account}/#{datahog_id}")
unless File.exist?(self.instance_variable_get(:@full_datahog_path))
FileUtils.mkdir_p(self.instance_variable_get(:@full_datahog_path))
end
self.instance_variable_set(:@full_datahog_data_path, "#{self.instance_variable_get(:@full_datahog_path)}/data")
unless File.exist?(self.instance_variable_get(:@full_datahog_data_path))
FileUtils.mkdir_p(self.instance_variable_get(:@full_datahog_data_path))
end
@logger.debug "User '#{user}'. instance var full_datahog_data_path : '#{self.instance_variable_get(:@full_datahog_data_path)}'"
true
end
rescue Exception => ex
@logger.error "FTP driver - Authentication failure #{ex.full_message}"
false
end
end
def file_system(user)
Ftpd::DiskFileSystem.new(self.instance_variable_get(:@full_datahog_path))
end
end
Modification to get full all thread stack trace
CommandLoop modified with this function
def read_and_execute_commands
catch :done do
begin
reply "220 #{server_name_and_version}"
loop do
begin
s = get_command
s = process_telnet_sequences(s)
syntax_error unless s =~ /^(\w+)(?: (.*))?$/
command, argument = $1.downcase, $2
unless valid_command?(command)
error "Syntax error, command unrecognized: #{s.chomp}", 500
end
command_sequence_checker.check command
execute_command command, argument
rescue FtpServerError => e
reply e.message_with_code
rescue => e
# Get the full back trace
puts "Error thread #{Thread.current.object_id}. Local error in processing. Error #{e.message}"
thread_count = 0
err_msg = ""
Thread.list.each do |t|
thread_count += 1
err_msg += "--- thread #{thread_count} of total #{Thread.list.size}. Thread.id #{t.object_id} backtrace begin \n"
# Lets see if we are able to pin down the culprit
# by collecting backtrace for all existing threads:
err_msg += t.backtrace.join("\n")
err_msg += "\n---thread #{thread_count} of total #{Thread.list.size}. Thread.id #{t.object_id} backtrace end \n"
end
puts "Full thread & object backtrace : - \n #{err_msg}"
reply "451 Requested action aborted. Local error in processing. Error #{e.message}"
config.exception_handler.call(e) unless config.exception_handler.nil?
end
end
rescue Errno::ECONNRESET, Errno::EPIPE
end
end
end
Working example weather station transfer
DEBUG : 2022-09-22 14:24:45 +0100 : 429400 : 220 SmartData_Skye 2.1.0
DEBUG : 2022-09-22 14:24:46 +0100 : 429400 : USER 19_51
DEBUG : 2022-09-22 14:24:46 +0100 : 429400 : 331 Password required
DEBUG : 2022-09-22 14:24:46 +0100 : 429400 : PASS **FILTERED**
DEBUG : 2022-09-22 14:24:46 +0100 : 429400 : User '19_51'. instance var full_datahog_data_path : '/var/www/shared/ftp_storage/19/51/data'
DEBUG : 2022-09-22 14:24:46 +0100 : 429400 : 230 Logged in
DEBUG : 2022-09-22 14:24:46 +0100 : 429400 : PASV
DEBUG : 2022-09-22 14:24:46 +0100 : 429400 : 227 Entering passive mode (212,110,172,113,15,234)
DEBUG : 2022-09-22 14:24:47 +0100 : 429400 : TYPE I
DEBUG : 2022-09-22 14:24:47 +0100 : 429400 : 200 Type set to I
DEBUG : 2022-09-22 14:24:47 +0100 : 429400 : CWD data
DEBUG : 2022-09-22 14:24:47 +0100 : 429400 : 250 "/data" is current directory
DEBUG : 2022-09-22 14:24:48 +0100 : 429400 : APPE 220922daydata.txt
DEBUG : 2022-09-22 14:24:48 +0100 : 429400 : 150 Opening BINARY mode data connection
DEBUG : 2022-09-22 14:24:51 +0100 : 429400 : import_data.account_id : 19 import_data.datahog_id : 51
DEBUG : 2022-09-22 14:24:51 +0100 : 429400 : 13:17:00 22.09.22 00 0425.28 01 018.69 02 004.462 03 00210 04 0000.00 05 01.4573 06 019.52 07 055.105
DEBUG : 2022-09-22 14:24:51 +0100 : 429400 : Date time found: '22.09.22 13:17:00'. Format to use '%d.%m.%y %H:%M:%S'
DEBUG : 2022-09-22 14:24:51 +0100 : 429400 : Received 126 bytes
DEBUG : 2022-09-22 14:24:51 +0100 : 429400 : 226 Transfer complete
Note: Above is the most common scenario in that I don't always see the extra quit line...
DEBUG : 2022-09-22 14:19:11 +0100 : 419900 : QUIT
DEBUG : 2022-09-22 14:19:11 +0100 : 419900 : 221 Byebye
...This is why I set the session timeout low in case connections aren't being closed.
Example of the odd one out
This connection is not made by the weather stations but by a 3rd party FTP request for their own purposes
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 220 SmartData_Skye 2.1.0
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : USER 17_49
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 331 Password required
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : PASS **FILTERED**
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : User '17_49'. instance var full_datahog_data_path : '/var/www/shared/ftp_storage/17/49/data'
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 230 Logged in
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : FEAT
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 211-Extensions supported:
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : EPRT
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : EPSV
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : MDTM
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : SIZE
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 211 END
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : TYPE I
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 200 Type set to I
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : CWD data
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 250 "/data" is current directory
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : EPSV
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 229 Entering extended passive mode (|||4054|)
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : RETR 220922daydata.txt
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 150 Opening BINARY mode data connection
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : Sent 12789 bytes
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 226 Transfer complete
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : MODE S
DEBUG : 2022-09-22 14:36:07 +0100 : 446940 : 200 Mode set to Strea
Full stack trace on io_fillbuf error including all threads
Included log from the start of thread id 2753380 : -